Gradient Tree Boosting System
A Gradient Tree Boosting System is a decision tree ensemble learning system that applies a Gradient Tree Boosting algorithm to solve a Gradient Tree Boosting task.
- AKA: Gradient Boosted Trees System.
- Context:
- It can range from being a Gradient Tree Boosting Classification System to being a Gradient Tree Boosting Regression System.
- …
- Example(s):
- Counter-Example(s):
- See: Naive-Bayes Training System.
References
2017a
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting Retrieved:2017-10-22.
- QUOTE: Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems. Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology.
The advantages of GBRT are:
- Natural handling of data of mixed type (= heterogeneous features)
- Predictive power
- Robustness to outliers in output space (via robust loss functions)
- QUOTE: Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems. Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology.
- The disadvantages of GBRT are:
- Scalability, due to the sequential nature of boosting it can hardly be parallelized.
- The module
sklearn.ensemble
provides methods for both classification and regression via gradient boosted regression trees.
- The disadvantages of GBRT are:
2017b
- (Wikipedia, 2017B) ⇒ https://en.wikipedia.org/wiki/Decision_tree_learning#Decision_tree_types Retrieved:2017-10-15.
- Decision trees used in data mining are of two main types:
- Classification tree analysis is when the predicted outcome is the class to which the data belongs.
- Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).
- The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the above procedures, first introduced by Breiman et al.[1] Trees used for regression and trees used for classification have some similarities - but also some differences, such as the procedure used to determine where to split.
Some techniques, often called ensemble methods, construct more than one decision tree:
- Boosted trees Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. A typical example is AdaBoost. These can be used for regression-type and classification-type problems. [2] [3]
- Bootstrap aggregated (or bagged) decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction. [4]
- A random forest classifier is a specific type of bootstrap aggregating
- Rotation forest - in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features. [5]
A special case of a decision tree is a decision list, which is a one-sided decision tree, so that every internal node has exactly 1 leaf node and exactly 1 internal node as a child (except for the bottommost node, whose only child is a single leaf node). While less expressive, decision lists are arguably easier to understand than general decision trees due to their added sparsity, permit non-greedy learning methods and monotonic constraints to be imposed.
- Decision trees used in data mining are of two main types:
- ↑ Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8.
- ↑ Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.
- ↑ Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.
- ↑ Breiman, L. (1996). Bagging Predictors. “Machine Learning, 24": pp. 123-140.
- ↑ Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.