2015 TurnWasteIntoWealthOnSimultaneo

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Dirty data commonly exist. Simply discarding a large number of inaccurate points (as noises) could greatly affect clustering results. We argue that dirty data can be repaired and utilized as strong supports in clustering. To this end, we study a novel problem of clustering and repairing over dirty data at the same time. Referring to the minimum change principle in data repairing, the objective is to find a minimum modification of inaccurate points such that the large amount of dirty data can enhance the clustering. We show that the problem can be formulated as an integer linear programming (ILP) problem. Efficient approximation is then devised by a linear programming (LP) relaxation. In particular, we illustrate that an optimal solution of the LP problem can be directly obtained without calling a solver. A quadratic time approximation algorithm is developed based on the aforesaid LP solution. We further advance the algorithm to linear time cost, where a trade-off between effectiveness and efficiency is enabled. Empirical results demonstrate that both the clustering and cleaning accuracies can be improved by our approach of repairing and utilizing the dirty data in clustering.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 TurnWasteIntoWealthOnSimultaneoShaoxu Song
Chunping Li
Xiaoquan Zhang
Turn Waste Into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data10.1145/2783258.27833172015