Spectral Clustering Algorithm
Jump to navigation
Jump to search
A Spectral Clustering Algorithm is a clustering algorithm that performs dimensionality reduction (for clustering in fewer dimensions) by means of the spectrum of the affinity matrix.
- Context:
- It can be applied by a Spectral Clustering System (that solve s a spectral clustering task).
- Example(s):
- Counter-Example(s):
- See: Kernel Principal Component Analysis; k-Way Spectral Clustering.
References
2011
- (Sammut & Webb, 2011) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). “Spectral Clustering.” In: (Sammut & Webb, 2011) p.907
- (Wikipedia - Spectral Clustering, 2011-Jun-13) ⇒ http://en.wikipedia.org/wiki/Cluster_analysis#Spectral_clustering
- QUOTE: Given a set of data points A, the similarity matrix may be defined as a matrix [math]\displaystyle{ S }[/math] where [math]\displaystyle{ S_{ij} }[/math] represents a measure of the similarity between points [math]\displaystyle{ i, j\in A }[/math]. Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction for clustering in fewer dimensions. One such technique is the Normalized Cuts algorithm by Shi-Malik, commonly used for image segmentation. It partitions points into two sets [math]\displaystyle{ (S_1,S_2) }[/math] based on the eigenvector [math]\displaystyle{ v }[/math] corresponding to the second-smallest eigenvalue of the Laplacian matrix [math]\displaystyle{ L = I - D^{-1/2}SD^{-1/2} }[/math] of [math]\displaystyle{ S }[/math], where [math]\displaystyle{ D }[/math] is the diagonal matrix [math]\displaystyle{ D_{ii} = \sum_{j} S_{ij}. }[/math] This partitioning may be done in various ways, such as by taking the median [math]\displaystyle{ m }[/math] of the components in [math]\displaystyle{ v }[/math], and placing all points whose component in [math]\displaystyle{ v }[/math] is greater than [math]\displaystyle{ m }[/math] in [math]\displaystyle{ S_1 }[/math], and the rest in [math]\displaystyle{ S_2 }[/math]. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion. A related algorithm is the Meila-Shi algorithm, which takes the eigenvectors corresponding to the [math]\displaystyle{ k }[/math] largest eigenvalues of the matrix [math]\displaystyle{ P = SD^{-1} }[/math] for some [math]\displaystyle{ k }[/math], and then invokes another algorithm (e.g. k-means) to cluster points by their respective [math]\displaystyle{ k }[/math] components in these eigenvectors.
2003
- (Bengio et al., 2003b) ⇒ Yoshua Bengio, Jean-François Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, Marie Ouimet. (2003). “Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering.” In: Advances in Neural Information Processing Systems 16 (NIPS 2003).
2001
- (Ng, Jordan & Weiss, 2001) ⇒ Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. (2001). “On Spectral Clustering: Analysis and an algorithm.” In: Advances in Neural Information Processing Systems, 14 (NIPS 2001)
- QUOTE: Despite many empirical successes of spectral clustering methods - algorithms that cluster points using eigenvectors of matrices derived from the data - there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab.