Bagged Trees Algorithm
(Redirected from Bagged Decision Trees)
Jump to navigation
Jump to search
A Bagged Trees Algorithm is a bagging algorithm that uses a decision tree learning algorithm.
References
2015
- http://en.wikipedia.org/wiki/Random_forest#Tree_bagging
- The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, …, xn with responses Y = y1, …, yn, bagging repeatedly selects a random sample with replacement of the training set and fits trees to these samples … After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': :[math]\displaystyle{ \hat{f} = \frac{1}{B} \sum_{b=1}^B \hat{f}_b (x') }[/math] or by taking the majority vote in the case of decision trees.
This bootstrapping procedure leads to better model performance because it decreases the variance of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets.
- The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, …, xn with responses Y = y1, …, yn, bagging repeatedly selects a random sample with replacement of the training set and fits trees to these samples … After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': :[math]\displaystyle{ \hat{f} = \frac{1}{B} \sum_{b=1}^B \hat{f}_b (x') }[/math] or by taking the majority vote in the case of decision trees.
2006
- (Caruana & Niculescu-Mizil, 2006) ⇒ Rich Caruana, and Alexandru Niculescu-Mizil. (2006). “An Empirical Comparison of Supervised Learning Algorithms.” In: Proceedings of the 23rd International Conference on Machine learning. ISBN:1-59593-383-2 doi:10.1145/1143844.1143865
- QUOTE: A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance.