2012 StratifiedKMeansClusteringovera
- (Liu & Agrawal, 2012) ⇒ Tantan Liu, and Gagan Agrawal. (2012). “Stratified K-means Clustering over a Deep Web Data Source.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339705
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222012%22+Stratified+K-means+Clustering+over+a+Deep+Web+Data+Source
- http://dl.acm.org/citation.cfm?id=2339530.2339705&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
This paper focuses on the problem of clustering data from a hidden or a deep web data source. A key characteristic of deep web data sources is that data can only be accessed through the limited query interface they support. Because the underlying data set cannot be accessed directly, data mining must be performed based on sampling of the datasets. The samples, in turn, can only be obtained by querying the deep web databases with specific inputs.
We have developed a new stratified clustering method addressing this problem for a deep web data source. Specifically, we have developed a stratified k-means clustering method. In our approach, the space of input attributes of a deep web data source is stratified for capturing the relationship between the input and the output attributes. The space of output attributes of a deep web data source is partitioned into sub-spaces. Three representative sampling methods are developed in this paper, with the goal of achieving a good estimation of the statistics, including proportions and centers, within the sub-spaces of the output attributes.
We have evaluated our methods using two synthetic and two real datasets. Our comparison shows significant gains in estimation accuracy from both the novel aspects of our work, i.e., the use of stratification (5%-55%), and our and representative sampling methods (up to 54%).
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 StratifiedKMeansClusteringovera | Gagan Agrawal Tantan Liu | Stratified K-means Clustering over a Deep Web Data Source | 10.1145/2339530.2339705 | 2012 |