Gradient Descent Boosting-based Decision Tree (GDBT) Learning Algorithm

A Gradient Descent Boosting-based Decision Tree (GDBT) Learning Algorithm is a gradient descent boosting-based learning algorithm that utilizes gradient descent optimization in conjunction with decision tree models, where each tree is trained to correct the errors of the previous ones.

Context:
- It can (typically) create a Decision Tree Ensemble.
- It can be implemented by a GBDT System (that solves a GBDT task).
- It can range from being a GBDT Classification Algorithm to being a GBDT Regression Algorithm.
- It can handle various data types, including categorical and continuous variables.
- It can be sensitive to the settings of hyperparameters, such as learning rate and tree depth, which require careful tuning for optimal performance.
- It can be computationally intensive, especially with large datasets and many trees.
- ...
Example(s):
- the one implemented in CatBoost, an open-source GBDT library, which improves handling categorical features as described by Liudmila Prokhorenkova et al. in 2018.
- the one implemented in TreeBoost, a specific implementation of GBDT, optimized for decision tree models, known for its high predictive accuracy.
- the one proposed for The Gradient Boosting Machine (GBM) framework, by Jerome H. Friedman in 2001.
- ...
Counter-Example(s):
See: Boosting Algorithm, Decision Tree Training Algorithm, Gradient Descent Algorithm.

References

2024

(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting Retrieved:2024-3-13.
- Gradient boosting is typically used with decision trees (especially CARTs) of a fixed size as base learners. For this special case, Friedman proposes a modification to gradient boosting method which improves the quality of fit of each base learner.
  Generic gradient boosting at the m-th step would fit a decision tree [math]\displaystyle{ h_m(x) }[/math] to pseudo-residuals. Let [math]\displaystyle{ J_{m} }[/math] be the number of its leaves. The tree partitions the input space into [math]\displaystyle{ J_{m} }[/math] disjoint regions [math]\displaystyle{ R_{1m}, \ldots, R_{J_{m}m} }[/math] and predicts a constant value in each region. Using the indicator notation, the output of [math]\displaystyle{ h_m(x) }[/math] for input x can be written as the sum: : [math]\displaystyle{ h_m(x) = \sum_{j=1}^{J_{m}} b_{jm} \mathbf {1}_{R_{jm}}(x), }[/math] where [math]\displaystyle{ b_{jm} }[/math] is the value predicted in the region [math]\displaystyle{ R_{jm} }[/math] . ^[1]
  Then the coefficients [math]\displaystyle{ b_{jm} }[/math] are multiplied by some value [math]\displaystyle{ \gamma_m }[/math] , chosen using line search so as to minimize the loss function, and the model is updated as follows: : [math]\displaystyle{ F_m(x) = F_{m-1}(x) + \gamma_m h_m(x), \quad \gamma_m = \underset{\gamma}{\operatorname{arg\,min}} \sum_{i=1}^n L(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)). }[/math] Friedman proposes to modify this algorithm so that it chooses a separate optimal value [math]\displaystyle{ \gamma_{jm} }[/math] for each of the tree's regions, instead of a single [math]\displaystyle{ \gamma_m }[/math] for the whole tree. He calls the modified algorithm "TreeBoost". The coefficients [math]\displaystyle{ b_{jm} }[/math] from the tree-fitting procedure can be then simply discarded and the model update rule becomes: : [math]\displaystyle{ F_m(x) = F_{m-1}(x) + \sum_{j=1}^{J_{m}} \gamma_{jm} \mathbf {1}_{R_{jm}}(x), \quad \gamma_{jm} = \underset{\gamma}{\operatorname{arg\,min}} \sum_{x_i \in R_{jm}} L(y_i, F_{m-1}(x_i) + \gamma). }[/math]

2018

(Prokhorenkova, et al., 2018) ⇒ Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. (2018). “CatBoost: Unbiased Boosting with Categorical Features.” In: Proceedings of advances in neural information processing systems 31

2009

http://www.dtreg.com/treeboost.htm
- TreeBoost - Stochastic Gradient Boosting
- "Boosting" is a technique for improving the accuracy of a predictive function by applying the function repeatedly in a series and combining the output of each function with weighting so that the total error of the prediction is minimized. In many cases, the predictive accuracy of such a series greatly exceeds the accuracy of the base function used alone.
- The TreeBoost algorithm used by DTREG is optimized for improving the accuracy of models built on decision trees. Research has shown that models built using TreeBoost are among the most accurate of any known modeling technique.
- The TreeBoost algorithm is functionally similar to Decision Tree Forests because it creates a tree ensemble, and it uses randomization during the tree creations. However, a random forest builds the trees in parallel and they "vote" on the prediction; whereas TreeBoost creates a series of trees, and the prediction receives incremental improvement by each tree in the series.

2001

(Friedman, 2001) ⇒ Jerome H. Friedman. (2001). “Greedy Function Approximation: A gradient boosting machine.” In: The Annals of Statistics, 29(5) doi:10.1214/aos/1013203451

↑ Note: in case of usual CART trees, the trees are fitted using least-squares loss, and so the coefficient [math]\displaystyle{ b_{jm} }[/math] for the region [math]\displaystyle{ R_{jm} }[/math] is equal to just the value of output variable, averaged over all training instances in [math]\displaystyle{ R_{jm} }[/math] .

[1] Note: in case of usual CART trees, the trees are fitted using least-squares loss, and so the coefficient [math]\displaystyle{ b_{jm} }[/math] for the region [math]\displaystyle{ R_{jm} }[/math] is equal to just the value of output variable, averaged over all training instances in [math]\displaystyle{ R_{jm} }[/math] .

[1]

Gradient Descent Boosting-based Decision Tree (GDBT) Learning Algorithm

References

2024

2018

2009

2001

Navigation menu

Search