2003 InformationTheoreticCoClustering
- (Dhillon et al., 2003) ⇒ Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha. (2003). “Information-Theoretic Co-Clustering.” In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003). doi:10.1145/956750.956764
Subject Headings: Co-clustering Algorithm.
Notes
Cited By
Quotes
Abstract
Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory --- the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2003 InformationTheoreticCoClustering | Inderjit S. Dhillon Subramanyam Mallela Dharmendra S. Modha | Information-Theoretic Co-Clustering | http://almaden.ibm.com/cs/people/dmodha/kdd cocluster.ps | 10.1145/956750.956764 |