k-Means Clustering Algorithm

AKA: Forgys Method, MacQueens Algorithm.
Context:
- It can be implemented by a k-Means Clustering System (to solve a k-means clustering task).
- It requires that k be provided.
- It can start by creating k initial sets, either at random or using some heuristic data.
- It can iterative refine the clusters by:
  - 1. Each Instance d_i is assigned to its closest Cluster Centroid.
  - 2. Each Cluster Centroid C_j is updated to be the Mean of its Members.
- It can terminate after an iteration that does not change Cluster composition.
- It can (often) be applied to Metric Spaces because of the requirements to compute the Mean.
- It can (often) be have Computational Complexity of [math]\displaystyle{ \mathcal{O} (n \times K \times I \times d) }[/math]; n : number of points; K : number of clusters; I : number of iterations; d : number of attributes.
- It can be considered a special base of the EM Algorithm, that assumes:
  1. Each cluster is modeled by a spherical Gaussian distribution;
  2. Each data item is assigned to a single cluster;
  3. The (prior probability) mixture weights are equal (each cluster is similarly sized).
Example(s):
See: Vector Quantization, Voronoi Cell, NP-Hard, Heuristic Algorithm, Mixture Model, Gaussian Distribution.

References

(Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/k-means_clustering Retrieved:2014-10-20.
- k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
  The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

(Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/K-means_clustering#Description Retrieved:2014-10-20.
- Given a set of observations (x₁, x₂, …, x_n), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S₁, S₂, …, S_k} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: [math]\displaystyle{ \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 }[/math] where μ_i is the mean of points in S_i.

(Ding & He, 2004) ⇒ Chris Ding, and Xiaofeng He. (2004). “K-means Clustering via Principal Component Analysis.” In: Proceedings of Int'l Conf. Machine Learning (ICML 2004).
http://www.nature.com/nrg/journal/v5/n6/glossary/nrg1349_glossary.html
- An unsupervised clustering technique. Data points are partitioned into a predetermined number of non-hierarchical clusters based on similarity.

(Zha et al., 2001) ⇒ H. Zha, Chris Ding, M. Gu, Xiaofeng He, and H.D. Simon. (2001). “Spectral Relaxation for K-means Clustering.” In: Neural Information Processing Systems vol.14 (NIPS 2001).
(Wagstaff et al., 2001) ⇒ Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. (2001). “Constrained K-means Clustering with Background Knowledge." In: Proceedings of the Eighteenth International Conference on Machine Learning.

(Larsen & Aone, 1999) ⇒ B. Larsen, and C. Aone. (1999). “Fast and Effective Text Mining using Linear-time Document Clustering.” In: Proceedings of KDD 1999.

(Neal & Hinton, 1998) ⇒ R. Neal, and G. Hinton. (1998). “A view of the EM algorithm that justifies incremental, sparse, and other variants” In: Michael I. Jordan (Ed.), Learning in Graphical Models, Kluwer: 1998.
- Allows for application of k-Means/EM to non-continuous metric spaces

(Cutting et al., 1992) ⇒ D. R. Cutting, D. R. Karger, J. O. Pedersen, and John W. Tukey. (1992). “Scatter/gather: A cluster-based approach to browsing large document collections.” In: Proceedings of the Fifteenth ACM SIGIR Conference Retrieval.

(Kaufman & Rousseeuw, 1990) ⇒ L. Kaufman, and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, March 1990.

(Hartigan & Wong, 1979) ⇒ J. A. Hartigan, and M. A. Wong. (1979). “A K-means Clustering Algorithm". In: Soc. Ser. C-Appl. Stat.

(Duda & Hart, 1973) ⇒ R. O. Duda, and P.E. Hart. (1973). “Pattern Classification and Scene Analysis. New York: John Wiley and Sons.

(MacQueen, 1967) ⇒ J. B. MacQueen. (1967). “Some Methods for Classification and Analysis of Multivariate Observations.” In: Proceedings of the Fifth Symposium on Math, Statistics, and Probability (pp. 281{297).

(Forgy, 1965) ⇒ E. Forgy. (1965). “Cluster Analysis of Multivariate Data: Efficiency vs. interpretability of classifications.” In: Biometrics 21:768.

(Steinhaus, 1956) ⇒ H. Steinhaus. (1956). “Sur la division des corp materiels en parties.” In: Bull. Acad. Polon. Sci., C1. III vol IV.