2008 SAILSummationbasedIncrementalLe
- (Wu et al., 2008) ⇒ Junjie Wu, Hui Xiong, and Jian Chen. (2008). “SAIL: Summation-based Incremental Learning for Information-theoretic Clustering.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401979
Subject Headings:
Notes
Cited By
Quotes
Author Keywords
Abstract
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which performs K-means clustering with the KL-divergence as the proximity function. While expert efforts on INFO-K-means have shown promising results, a remaining challenge is to deal with high-dimensional sparse data. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional sparse data. This leads to infinite KL-divergence values, which create a dilemma in assigning objects to the centroids during the iteration process of K-means. To meet this dilemma, in this paper, we propose a Summation-based Incremental Learning (SAIL) method for INFO-K-means clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of the KL-divergence by the computation of the Shannon entropy. This can avoid the zero-value dilemma caused by the use of the KL-divergence. Our experimental results on various real-world document data sets have shown that, with SAIL as a booster, the clustering performance of K-means can be significantly improved. Also, SAIL leads to quick convergence and a robust clustering performance on high-dimensional sparse data.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 SAILSummationbasedIncrementalLe | Junjie Wu Jian Chen Hui Xiong | SAIL: Summation-based Incremental Learning for Information-theoretic Clustering | 10.1145/1401890.1401979 |