Totally Random Trees Embedding System
Jump to navigation
Jump to search
An Totally Random Trees Embedding System is an Decision Tree Ensemble Learning System that implements an Unsupervised Data Transformation Algorithm to solve a Totally Random Trees Embedding Task.
- Example(s):
- Counter-Example(s)
- See: Decision Tree, Ensemble-based Learning System, Supervised Learning System, Unsupervised Learning System, Random Forests System.
References
2017
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#random-trees-embedding Retrieved:2017-10-29
- QUOTE:
RandomTreesEmbedding
implements an unsupervised transformation of the data. Using a forest of completely random trees,RandomTreesEmbedding
encodes the data by the indices of the leaves a data point ends up in. This index is then encoded in a one-of-K manner, leading to a high dimensional, sparse binary coding. This coding can be computed very efficiently and can then be used as a basis for other learning tasks. The size and sparsity of the code can be influenced by choosing the number of trees and the maximum depth per tree. For each tree in the ensemble, the coding contains one entry of one. The size of the coding is at most n_estimators * 2 ** max_depth, the maximum number of leaves in the forest.As neighboring data points are more likely to lie within the same leaf of a tree, the transformation performs an implicit, non-parametric density estimation.
- QUOTE:
2007
- (Moosmann et al., 2007) ⇒ Moosmann, F., Triggs, B., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In Advances in Neural Information Processing Systems (pp. 985-992).
- ABSTRACT: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks.