Ensemble-based Prediction Function

An Ensemble-based Prediction Function is a composite function structure that is composed of many "base" prediction functions.

Context:
- It can benefit from the individual strengths of the "base" Decision Function.
- It can range from being a Data Driven Ensemble Function (e.g. produced by ensemble learning system) to being a Heuristic Ensemble Function (e.g. produced by Crowdsourcing).
- It can range from being Ensemble-based Classification Function to being an Ensemble-based Ranking Function to being an Ensemble-based Numeric Prediction Function.
- …
Example(s):
- an Decision Tree Ensemble: File:Ensemble Decision Tree.jpg.
- a Stacked Model.
- …
Counter-Example(s):
- a Logistic Classifier.
- a Neural Network-based Model.
See: No Free Lunch, Supervised Learning.

References

2017A

(Brown, 2017) ⇒ Gavin Brown (2017). "Ensemble Learning" In: "Encyclopedia of Machine Learning and Data Mining" pp 393-402
- QUOTE: Ensemble learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a “committee” of decision makers. The principle is that the decision of the committee, with individual predictions combined appropriately, should have better overall accuracy, on average, than any individual committee member. Numerous empirical and theoretical studies have demonstrated that ensemble models very often attain higher accuracy than single models.
  The members of the ensemble might be predicting real-valued numbers, class labels, posterior probabilities, rankings, clusterings, or any other quantity. Therefore, their decisions can be combined by many methods, including averaging, voting, and probabilistic methods. The majority of ensemble learning methods are generic, applicable across broad classes of model types and learning tasks.

2017B

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Decision_tree_learning#Decision_tree_types Retrieved:2017-10-15.
- Decision trees used in data mining are of two main types:
  - Classification tree analysis is when the predicted outcome is the class to which the data belongs.
  - Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).
- The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the above procedures, first introduced by Breiman et al.^[1] Trees used for regression and trees used for classification have some similarities - but also some differences, such as the procedure used to determine where to split.
  Some techniques, often called ensemble methods, construct more than one decision tree:
  - Boosted trees Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. A typical example is AdaBoost. These can be used for regression-type and classification-type problems. ^[2] ^[3]
  - Bootstrap aggregated (or bagged) decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction. ^[4]
    - A random forest classifier is a specific type of bootstrap aggregating
  - Rotation forest - in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features. ^[5]
    A special case of a decision tree is a decision list, which is a one-sided decision tree, so that every internal node has exactly 1 leaf node and exactly 1 internal node as a child (except for the bottommost node, whose only child is a single leaf node). While less expressive, decision lists are arguably easier to understand than general decision trees due to their added sparsity, permit non-greedy learning methods and monotonic constraints to be imposed.
    Decision tree learning is the construction of a decision tree from class-labeled training tuples. A decision tree is a flow-chart-like structure, where each internal (non-leaf) node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node.
    There are many specific decision-tree algorithms. Notable ones include:
  - ID3 (Iterative Dichotomiser 3) * C4.5 (successor of ID3)
  - CART (Classification And Regression Tree)
  - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees.
  - MARS: extends decision trees to handle numerical data better.
  - Conditional Inference Trees. Statistics-based approach that uses non-parametric tests as splitting criteria, corrected for multiple testing to avoid overfitting. This approach results in unbiased predictor selection and does not require pruning.^[6] ^[7]
- ID3 and CART were invented independently at around the same time (between 1970 and 1980), yet follow a similar approach for learning decision tree from training tuples.

↑ Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8.
↑ Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.
↑ Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.
↑ Breiman, L. (1996). Bagging Predictors. “Machine Learning, 24": pp. 123-140.
↑ Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.
↑ Hothorn, T.; Hornik, K.; Zeileis, A. (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework". Journal of Computational and Graphical Statistics. 15 (3): 651–674. JSTOR 27594202. doi:10.1198/106186006X133933.
↑ Strobl, C.; Malley, J.; Tutz, G. (2009). “An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests". Psychological Methods. 14 (4): 323–348. doi:10.1037/a0016973.

[bfos-1] Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8.

[2] Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.

[3] Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.

[4] Breiman, L. (1996). Bagging Predictors. “Machine Learning, 24": pp. 123-140.

[5] Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.

[Hothorn2006-6] Hothorn, T.; Hornik, K.; Zeileis, A. (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework". Journal of Computational and Graphical Statistics. 15 (3): 651–674. JSTOR 27594202. doi:10.1198/106186006X133933.

[Strobl2009-7] Strobl, C.; Malley, J.; Tutz, G. (2009). “An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests". Psychological Methods. 14 (4): 323–348. doi:10.1037/a0016973.

[1]

[2]

[3]

[4]

[5]

[6]

[7]