Heterogeneous Dataset
(Redirected from non-homogeneous dataset)
Jump to navigation
Jump to search
A Heterogeneous Dataset is a dataset with more than one data source (with difference data schema).
- AKA: Non-Homogeneous Data.
- Context:
- It can range from being a Simple Heterogeneous Dataset to being a Complex Heterogeneous Dataset.
- It can be an input to a Heterogeneous Data Analysis Task.
- …
- Example(s):
- Counter-Example(s):
- See: Complex Dataset, Domain-Dependent. Heterogeneous, Heterogeneous Data Analytics.
References
1998
- (Fayyad, 1998) ⇒ Usama M. Fayyad. (1998). “Mining Databases: Towards Algorithms for Knowledge Discovery. In: IEEE Data Engineering Bulletin, 21.
- QUOTE: Often mining is desirable over non-homogenous data sets (including mixtures of multimedia, video, and text modalities); current methods assume fairly uniform and simple data structure.
1997
- (Brin et al., 1997) ⇒ Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. (1997). “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of data (SIGMOD 1997). doi:10.1145/253260.253325
- QUOTE: Non-homogeneous Data: One weakness of DIC is that it is sensitive to how homogeneous the data is. In particular, if the data is very correlated, we may not realize that an itemset is actually large until we have counted it in most of the database. If this happens, then we will not shift our hypothetical boundary and start counting some of the itemset's supersets until we have almost finished counting the itemset. As it turns out, the census data we used is ordered by census district and exactly this problem occurs. …