2014 ScalableHistogramsonLargeProbab
- (Tang & Li, 2014) ⇒ Mingwang Tang, and Feifei Li. (2014). “Scalable Histograms on Large Probabilistic Data.” In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2014) Journal. ISBN:978-1-4503-2956-9 doi:10.1145/2623330.2623640
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222014%22+Scalable+Histograms+on+Large+Probabilistic+Data
- http://dl.acm.org/citation.cfm?id=2623330.2623640&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Histogram construction is a fundamental problem in data management, and a good histogram supports numerous mining operations. Recent work has extended histograms to probabilistic data. However, constructing histograms for probabilistic data can be extremely expensive, and existing studies suffer from limited scalability. This work designs novel approximation methods to construct scalable histograms on probabilistic data. We show that our methods provide constant approximations compared to the optimal histograms produced by the state-of-the-art in the worst case. We also extend our methods to parallel and distributed settings so that they can run gracefully in a cluster of commodity machines. We introduced novel synopses to reduce communication cost when running our methods in such settings. Extensive experiments on large real data sets have demonstrated the superb scalability and efficiency achieved by our methods, when compared to the state-of-the-art methods. They also achieved excellent approximation quality in practice.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 ScalableHistogramsonLargeProbab | Mingwang Tang Feifei Li | Scalable Histograms on Large Probabilistic Data | 10.1145/2623330.2623640 | 2014 |