sklearn.ensemble.ExtraTreesClassifier
A sklearn.ensemble.ExtraTreesClassifier is an Extremely Randomized Trees Classification System within sklearn.ensemble
module.
- AKA: ExtraTreesClassifier.
- Context
- Usage:
- 1) Import Extremely Randomized Trees Classification System from scikit-learn :
from sklearn.ensemble import ExtraTreesClassifier
- 2) Create design matrix
X
and response vectorY
- 3) Create Extremely Randomized Trees Classifier object:
clf=ExtraTreesClassifier([n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1,...])
- 4) Choose method(s):
apply(X)
, applies trees in the forest to X, return leaf indices.decision_path(X)
, returns the decision path in the forestfit(X, y[, sample_weight])
, build a forest of trees from the training set (X, y).get_params([deep])
, retrieves parameters for this estimator.predict(X)
, predicts class for X.predict_log_proba(X)
, predicts class log-probabilities for X.predict_proba(X)
, predicts class probabilities for X.score(X, y[, sample_weight])
, returns the mean accuracy on the given test data and labels.set_params(**params)
, sets the parameters of this estimator.
- 1) Import Extremely Randomized Trees Classification System from scikit-learn :
- Example(s):
- Counter-Example(s):
sklearn.ensemble.ExtraTreesRegressor
.sklearn.ensemble.AdaBoostClassifier
.sklearn.ensemble.AdaBoostRegressor
.sklearn.ensemble.BaggingClassifier
.sklearn.ensemble.BaggingRegressor
.sklearn.ensemble.GradientBoostingClassifier
.sklearn.ensemble.GradientBoostingRegressor
.sklearn.ensemble.IsolationForest
.sklearn.ensemble.RandomForestClassifier
.sklearn.ensemble.RandomForestRegressor
.sklearn.ensemble.RandomTreesEmbedding
.sklearn.ensemble.VotingClassifier
.
- See: Decision Tree, Classification System, Regularization Task, Ridge Regression Task, Kernel-based Classification Algorithm.
References
2017a
- (Scikit Learn, 2017A) ⇒ http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
- QUOTE:
class sklearn.ensemble.ExtraTreesClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Read more in the User Guide
(...)
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
- QUOTE:
2017b
- (Scikit Learn, 2017B) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees
- QUOTE: In extremely randomized trees (see
ExtraTreesClassifier
andExtraTreesRegressor
classes), randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias (...)
- QUOTE: In extremely randomized trees (see
2006
- (Geurts et al., 2006) ⇒ Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42. https://doi.org/10.1007/s10994-006-6226-1
- ABSTRACT: This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.