2006 MeasuringConstraintSetUtilityfo
- (Davidson et al., 2006) ⇒ Ian Davidson, Kiri L. Wagstaff, and Sugato Basu. (2006). “Measuring Constraint-Set Utility for Partitional Clustering Algorithms.” In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. Knowledge Discovery in Databases: PKDD 2006. ISBN:978-3-540-45374-1 doi:10.1007/11871637_15
Subject Headings: Cluster Performance; Rand Index; Cluster Accuracy; Partitional Cluster Algorithm; High Informativeness.
Notes
Cited By
Quotes
Abstract
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance, with respect to the true data labels. However, in most of these experiments, results are averaged over different randomly chosen constraint sets, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
References
- Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
- Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proceedings of the Nineteenth International Conference on Machine Learning (2002)
- Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. NIPS 15 (2003)
- Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)
- Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6 (2005)
- Wagstaff, K.L.: Intelligent Clustering with Instance-Level Constraints. PhD thesis, Cornell University (2002)
- Lu, Z., Leen, T.K.: Semi-supervised learning with penalized probabilistic clustering. In: Advances in Neural Information Processing Systems 17 (2005)
- Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining (2005)
- Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
- Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
- Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(366) (1971);
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2006 MeasuringConstraintSetUtilityfo | Sugato Basu Ian Davidson Kiri L. Wagstaff | Measuring Constraint-Set Utility for Partitional Clustering Algorithms | 10.1007/11871637_15 | 2006 |