2002 ClusterEnsemblesB

From GM-RKB
(Redirected from Strehl & Ghosh, 2002b)
Jump to navigation Jump to search

Subject Headings: Cluster Ensemble, Consensus Clustering, Normalized Mutual Information.

Notes

Cited By

Quotes

Author Keywords

cluster analysis, clustering, partitioning, unsupervised learning, multi-learner systems, ensemble, mutual information, consensus functions, knowledge reuse

Abstract

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant `knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.

{{#ifanon:|

2.2 Objective Function for Cluster Ensembles

Mutual information, which is a symmetric measure to quantify the statistical information shared between two distributions (Cover and Thomas, 1991), provides a sound indication of the shared information between a pair of clusterings. Let X and Y be the random variables described by the cluster labeling [math]\displaystyle{ \lambda^{(a)} }[/math] and [math]\displaystyle{ \lambda^{(b)} }[/math], with [math]\displaystyle{ k^{(a)} }[/math] and [math]\displaystyle{ k^{(b)} }[/math] groups respectively. Let [math]\displaystyle{ I(X,Y) }[/math] denote the mutual information between [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ H(X) }[/math] denote the entropy of [math]\displaystyle{ X }[/math]. One can show that [math]\displaystyle{ I(X,Y) }[/math] is a metric. There is no upper bound for [math]\displaystyle{ I(X,Y) }[/math], so for easier interpretation and comparisons, a normalized version of [math]\displaystyle{ I(X,Y) }[/math] that ranges from 0 to 1 is desirable. Several normalizations are possible based on the observation that [math]\displaystyle{ I(X,Y) \le min(H(X),H(Y)) }[/math]. These include normalizing using the arithmetic or geometric mean of [math]\displaystyle{ H(X) }[/math] and [math]\displaystyle{ H(Y) }[/math]. Since [math]\displaystyle{ H(X) = I(X,X) }[/math], we prefer the geometric mean because of the analogy with a normalized inner product in Hilbert space. Thus the normalized mutual information (NMI)[1] used is

[math]\displaystyle{ NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)} }. }[/math] (2)

One can see that [math]\displaystyle{ NMI(X,X) = 1 }[/math], as desired. Equation 2 needs to be estimated by the sampled quantities provided by the clusterings. …

  1. Our earlier work (Strehl and Ghosh, 2002a) used a slightly different normalization as only balanced clusters were desired: [math]\displaystyle{ NMI(X,Y) = 2 \cdot I(X,Y)=(log k^{(a)} + log k^{(b)}) }[/math], i.e., using arithmetic mean and assuming maximum entropy caused by perfect balancing.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 ClusterEnsemblesBJoydeep Ghosh
Alexander Strehl
Cluster Ensembles: A knowledge reuse framework for combining partitionsJournal of Machine Learning Research, 3http://jmlr.csail.mit.edu/papers/volume3/strehl02a/strehl02a.pdf2002