Bagging Classification System
An Bagging Classification System is a Bagged Trees System for solving a classification problem.
- AKA: Bagging Classifier, Bagged Trees Classification System, Bagged Trees Classifier.
- Context:
- It can solve Decision Tree Ensemble Learning Tasks and Bagging Classification Tasks by implementing a Bagging Algorithms.
- Example(s)
- Counter-Example(s):
- See: Decision Tree, Ensemble Learning System, Classification Task, Regression Task.
References
2017a
- (Sammut & Webb, 2017) ⇒ Claude Sammut and Geoffrey I Webb. (2017). "Bagging" In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 97-98.
- QUOTE: Bagging is an ensemble learning technique. The name “Bagging” is an acronym derived from Bootstrap AGGregatING. Each member of the ensemble is constructed from a different training dataset. Each dataset is a bootstrap sample from the original. The models are combined by a uniform average or vote. Bagging works best with unstable learners, that is those that produce differing generalization patterns with small changes to the training data. Bagging therefore tends not to work well with linear models. See ensemble learning for more details.
2017b
- (Sammut & Webb, 2017) ⇒ "Ensemble Learning". In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402.
- QUOTE: In the Bagging algorithm (Breiman 1996) each member of the ensemble is constructed from a different training dataset, and the predictions combined either by uniform averaging or voting over class labels. Each dataset is generated by sampling from the total N data examples, choosing N items uniformly at random with replacement. Each sample is known as a bootstrap; the name Bagging is an acronym derived from Bootstrap AGGregatING. Since a bootstrap samples N items uniformly at random with replacement, the probability of any individual data item not being selected is [math]\displaystyle{ p = (1 − 1∕N)^N }[/math]. Therefore with large N, a single bootstrap is expected to contain approximately 63. 2 % of the original set, while 36. 8 % of the originals are not selected.
Like many ensemble methods, Bagging works best with unstable models, that is those that produce differing generalization behavior with small changes to the training data. These are also known as high variance models, examples of which are decision trees and neural networks. Bagging therefore tends not to work well with very simple models. In effect, Bagging samples randomly from the space of possible models to make up the ensemble – with very simple models the sampling produces almost identical (low diversity) predictions.
Despite its apparent capability for variance reduction, situations have been demonstrated where Bagging can converge without affecting variance (see Brown et al. 2005). Several other explanations have been proposed for Bagging’s success, including links to Bayesian model averaging. In summary, it seems that several years from its introduction, despite its apparent simplicity, Bagging is still not fully understood.
- QUOTE: In the Bagging algorithm (Breiman 1996) each member of the ensemble is constructed from a different training dataset, and the predictions combined either by uniform averaging or voting over class labels. Each dataset is generated by sampling from the total N data examples, choosing N items uniformly at random with replacement. Each sample is known as a bootstrap; the name Bagging is an acronym derived from Bootstrap AGGregatING. Since a bootstrap samples N items uniformly at random with replacement, the probability of any individual data item not being selected is [math]\displaystyle{ p = (1 − 1∕N)^N }[/math]. Therefore with large N, a single bootstrap is expected to contain approximately 63. 2 % of the original set, while 36. 8 % of the originals are not selected.
2017c
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator Retrieved: 2017-10-22.
- QUOTE: In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction. These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm. As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).
Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set:
- When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [B1999].
- When samples are drawn with replacement, then the method is known as Bagging [B1996].
- When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [H1998].
- QUOTE: In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction. These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm. As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).
- Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [LG2012].
- In scikit-learn, bagging methods are offered as a unified
BaggingClassifier
meta-estimator (resp.BaggingRegressor
), taking as input a user-specified base estimator along with parameters specifying the strategy to draw random subsets. In particular, max_samples and max_features control the size of the subsets (in terms of samples and features), while bootstrap and bootstrap_features control whether samples and features are drawn with or without replacement. When using a subset of the available samples the generalization accuracy can be estimated with the out-of-bag samples by setting oob_score=True. As an example, the snippet below illustrates how to instantiate a bagging ensemble ofKNeighborsClassifier
base estimators, each built on random subsets of 50% of the samples and 50% of the features.
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Bootstrap_aggregating Retrieved:2017-10-22.
- Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.