KD-Tree Clustering Algorithm

Context:
- It provide for efficient EM-style clustering of many items.
- It is a High-Dimensionality Clustering Algorithm.
Example(s):
- …
Counter-Example(s):
See: Hierarchical Clustering, Graph Theory, Expectation-Maximization Algorithm, Tree Data Structure, Nearest Neighbor Search, Binary Space Partitioning, Clustering Algorithm, Decision Tree.

References

(Blekas & Likas, 2004) ⇒ Konstantinos Blekas, and Aristidis Likas (2004, May). "Incremental mixture learning for clustering discrete data". In Hellenic Conference on Artificial Intelligence (pp. 210-219). Springer, Berlin, Heidelberg.
- QUOTE: A kd-tree defines a recursive binary partitioning of a k-dimensional dataset, where the root node contains all data. A subset of the original dataset is assigned to each tree node and the tree construction procedure proceeds by partitioning the subset of a node into two subsets using a hyperplane perpendicular to the direction for which the subset data demonstrate the highest variance.

(Moore, 1998) ⇒ Andrew Moore. (1998). “Very Fast EM-Based Mixture Model Clustering Using Multiresolution kd-Trees.” In: ProceedingsConf. Neural Information Processing Systems (NIPS 1998).
- ABSTRACT: Clustering is important in many fields including manufacturing, biology, finance, and astronomy. Mixture models are a popular approach due to their statistical foundations, and EM is a very popular method for finding mixture models. EM, however, requires many accesses of the data, and thus has been dismissed as impractical for data mining of enormous datasets. We present a new algorithm, based on the multiresolution kd-trees of our earlier work, which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints. Although presented here for maximum likelihood estimation of Gaussian mixture models, it is also applicable to non-Gaussian models (provided class densities are monotonic in Mahalanobis distance), mixed categorical/numeric clusters, and Bayesian methods such as Autoclass.

Hanan Samet. (1990). “The Design and Analysis of Spatial Data Structures.” Addison-Wesley.