Regression Tree Post-Pruning Algorithm
A Regression Tree Post-Pruning Algorithm is a Decision Tree Post-Pruning Algorithm that can solve a Regression Tree Post-Pruning Task.
- Context:
- It can be applied by a Regression Tree Post-Pruning System (typically as part of a regression tree learning system).
- …
- Counter-Example(s):
- See: Model Complexity Reduction, Model Error Reduction, Overfitted Regression Function.
References
2017
- (Torgo, 2017) ⇒ Luı́s Torgo, (2017). "Regression Trees". In Encyclopedia of Machine Learning and Data Mining pp 1080-1083.
- QUOTE: As most nonparametric modeling techniques, regression trees may over-fit the training data which will inevitably lead to poor out of the sample predictive performance. The standard procedure to fight this undesirable effect is to grow an overly large tree and then to use some reliable error estimation procedure to find the “best” sub-tree of this large model. This procedure is known as post-pruning a tree (Breiman et al. 1984 [1]). An alternative is to stop tree growth sooner in a process known as pre-pruning which again needs to be guided by reliable error estimation to known when over-fitting is starting to occur. Although more efficient in computational terms, this latter alternative may lead to stop tree growth too soon even with look-ahead mechanisms.
Post-pruning is usually carried out in a three-stage procedure: (i) a set of sub-trees of the initial tree is generated, (ii) some reliable error estimation procedure is used to obtain estimates for each member of this set, and (iii) some method based on these estimates is used to select one of these trees as the final tree model. Different methods exist for each of these steps. A common setup (e.g., Breiman et al. 1984 ) is to use error-complexity pruning to generate a sequence of nested sub-trees, whose error is then estimated by cross validation. The final tree is selected using the x-SE rule which starts with the lowest estimated error sub-tree and then selects the smallest tree within the interval of x standard errors of the lowest estimated error tree (a frequent setting is to use 1 standard error).
Variations on the subject of pruning regression trees include, among others, pre-pruning alternatives (e.g., Breiman and Meisel 1976[2] and Friedman 1979[3]), the use of different tree error estimators (see Torgo (1998)[4] for a comparative study and references to different alternatives), and the use of the MDL principle to guide the pruning (Robnik-Sikonja and Kononenko 1998[5]).
- QUOTE: As most nonparametric modeling techniques, regression trees may over-fit the training data which will inevitably lead to poor out of the sample predictive performance. The standard procedure to fight this undesirable effect is to grow an overly large tree and then to use some reliable error estimation procedure to find the “best” sub-tree of this large model. This procedure is known as post-pruning a tree (Breiman et al. 1984 [1]). An alternative is to stop tree growth sooner in a process known as pre-pruning which again needs to be guided by reliable error estimation to known when over-fitting is starting to occur. Although more efficient in computational terms, this latter alternative may lead to stop tree growth too soon even with look-ahead mechanisms.
1998
- (Robnik-Šikonja, 1998) ⇒ Marko Robnik-Šikonja, and Igor Kononenko. (1998). “Pruning Regression Trees with MDL.” In: Proceedings of the 13th European Conference on Artificial Intelligence (ECAI, 1998).
- ABSTRACT: Pruning is a method for reducing the error and complexity of induced trees. There are several approaches to pruning decision trees, while regression trees have attracted less attention.
We propose a method for pruning regression trees based on the sound foundations of the MDL principle. We develop coding schemes for various constructs and models in the leaves and empirically test the new method against two well know pruning algorithms.
The results are favourable to the new method.
- QUOTE: Our algorithm for pruning regression trees is based on the ideas of (10, 6 ]. Namely, in each interior node we compare the code of the model in that node with the codes of the subtrees and their respective models. If the latter is larger than the former the subtrees are pruned and the node is turned into a leaf. We recursively repeat this procedure from leaves to the root of the tree.
- ABSTRACT: Pruning is a method for reducing the error and complexity of induced trees. There are several approaches to pruning decision trees, while regression trees have attracted less attention.
- ↑ Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Statistics/probability series. Wadsworth & Brooks/Cole Advanced Books & Software, Belmont
- ↑ Breiman L, Meisel WS (1976) General estimates of the intrinsic variability of data in nonlinear regression models. J Am Stat Assoc 71:301–307
- ↑ Friedman JH (1979) A tree-structured approach to nonparametric multiple regression. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture notes in mathematics, vol 757. Springer, Berlin/New York, pp 5–22
- ↑ Torgo L (1998) Error estimates for pruning regression trees. In: Nedellec C, Rouveirol C (eds) Proceedings of the 10th European conference on Machine Learning, Chemnitz. LNAI, vol 1398. Springer
- ↑ Robnik-Sikonja M, Kononenko I (1998) Pruning regression trees with MDL. In: Proceedings of ECAI-98, Brighton