2015 TurnWasteIntoWealthOnSimultaneo
- (Song et al., 2015) ⇒ Shaoxu Song, Chunping Li, and Xiaoquan Zhang. (2015). “Turn Waste Into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783317
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222015%22+Turn+Waste+Into+Wealth%3A+On+Simultaneous+Clustering+and+Cleaning+over+Dirty+Data
- http://dl.acm.org/citation.cfm?id=2783258.2783317&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Dirty data commonly exist. Simply discarding a large number of inaccurate points (as noises) could greatly affect clustering results. We argue that dirty data can be repaired and utilized as strong supports in clustering. To this end, we study a novel problem of clustering and repairing over dirty data at the same time. Referring to the minimum change principle in data repairing, the objective is to find a minimum modification of inaccurate points such that the large amount of dirty data can enhance the clustering. We show that the problem can be formulated as an integer linear programming (ILP) problem. Efficient approximation is then devised by a linear programming (LP) relaxation. In particular, we illustrate that an optimal solution of the LP problem can be directly obtained without calling a solver. A quadratic time approximation algorithm is developed based on the aforesaid LP solution. We further advance the algorithm to linear time cost, where a trade-off between effectiveness and efficiency is enabled. Empirical results demonstrate that both the clustering and cleaning accuracies can be improved by our approach of repairing and utilizing the dirty data in clustering.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 TurnWasteIntoWealthOnSimultaneo | Shaoxu Song Chunping Li Xiaoquan Zhang | Turn Waste Into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data | 10.1145/2783258.2783317 | 2015 |