sklearn.ensemble.GradientBoostingClassifier
Jump to navigation
Jump to search
A sklearn.ensemble.GradientBoostingClassifier is an Gradient Boosting Classification System within sklearn.ensemble
module.
- AKA: GradientBoostingClassifier.
- Context
- Usage:
- 1) Import Gradient Tree Boosting Classification System from scikit-learn :
from sklearn.ensemble import GradientBoostingClassifier
- 2) Create design matrix
X
and response vectorY
- 3) Create Gradient Tree Boosting Classifier object:
BC=GradientBoostingClassifier([loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, ...])
- 4) Choose method(s):
apply(X)
, applies trees in the ensemble to X, return leaf indices.decision_function(X)
, computes the decision function of X.fit(X, y[, sample_weight, monitor])
, fits the gradient boosting model.get_params([deep])
, gets parameters for this estimator.predict(X)
, predicts class for X.predict_log_proba(X)
, predicts class log-probabilities for X.predict_proba(X)
, predicts class probabilities for X.score(X, y[, sample_weight])
, returns the mean accuracy on the given test data and labels.set_params(**params)
, sets the parameters of this estimator.staged_decision_function(X)
, computes decision function of X for each iteration.staged_predict(X)
, predicts class at each stage for X.staged_predict_proba(X)
, predicts class probabilities at each stage for X.
- 1) Import Gradient Tree Boosting Classification System from scikit-learn :
- Example(s):
- Counter-Example(s):
sklearn.ensemble.GradientBoostingRegressor
.sklearn.ensemble.BaggingRegressor
.sklearn.ensemble.BaggingClassifier
.sklearn.ensemble.RandomForestRegressor
sklearn.ensemble.RandomForestClassifier
.sklearn.ensemble.AdaBoostClassifier
.sklearn.ensemble.AdaBoostRegressor
.sklearn.ensemble.ExtraTreesClassifier
.sklearn.ensemble.ExtraTreesRegressor
.sklearn.ensemble.IsolationForest
.sklearn.ensemble.RandomTreesEmbedding
.sklearn.ensemble.VotingClassifier
.
- See: Decision Tree, Classification System, Regularization Task, Ridge Regression Task, Kernel-based Classification Algorithm.
References
2017a
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html Retrieved:2017-10-22.
- QUOTE:
class sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
Gradient Boosting for classification.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced. Read more in the User Guide.
- QUOTE:
2017b
- (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting Retrieved:2017-10-22.
- QUOTE: Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems. Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology.
The advantages of GBRT are:
- Natural handling of data of mixed type (= heterogeneous features)
- Predictive power
- Robustness to outliers in output space (via robust loss functions)
- QUOTE: Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. GBRT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems. Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology.
- The disadvantages of GBRT are:
- Scalability, due to the sequential nature of boosting it can hardly be parallelized.
- The module
sklearn.ensemble
provides methods for both classification and regression via gradient boosted regression trees.
- The disadvantages of GBRT are: