CART Algorithm
(Redirected from CART algorithm)
Jump to navigation
Jump to search
A CART algorithm is a decision tree training algorithm that uses a Gini impurity index as a decision tree splitting criterion.
- AKA: Classification and Regression Trees, CART.
- Context:
- It can (typically) perform 2-way Splits.
- It was first proposed in (Breiman et al., 1984).
- It can be applied by a CART-based Classification System (to solve a CART-based classification task)
- Example(s):
- an RPART algorithm, (a derivational algorithm).
- …
- Counter-Example(s):
- See: Leo Breiman.
References
2012
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees
- Classification and regression trees (CART) is a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.
2011
- http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_tree-cart_split-criteria_categorical_gini.htm
- Gini Criterion (CART algorithms) The Gini impurity measure at a node t is defined as :[math]\displaystyle{ i(t)=Σi,jC(i|j)p(i|t)p(j|t) }[/math] The Gini splitting criterion is the decrease of impurity defined as :[math]\displaystyle{ Δi(s,t)=i(t)−pLi(tL)−pRi(tR) }[/math] where pL and pR are probabilities of sending a case to the left child node tL and to the right child node tR respectively. They are estimated as pL=p(tL)/p(t) and pR=p(tR)/p(t).
Note: When user-specified costs are involved, the altered priors can optionally be used to replace the priors. When altered priors are used, the problem is considered as if no costs are involved. The altered prior is defined as
- Gini Criterion (CART algorithms) The Gini impurity measure at a node t is defined as :[math]\displaystyle{ i(t)=Σi,jC(i|j)p(i|t)p(j|t) }[/math] The Gini splitting criterion is the decrease of impurity defined as :[math]\displaystyle{ Δi(s,t)=i(t)−pLi(tL)−pRi(tR) }[/math] where pL and pR are probabilities of sending a case to the left child node tL and to the right child node tR respectively. They are estimated as pL=p(tL)/p(t) and pR=p(tR)/p(t).
2009
- (Steinberg, 2009) ⇒ Dan Steinberg. (2009). “CART: Classification and Regression Trees.” In: (Wu & Kumar, 2009), "The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN 1420089641
- QUOTE: The 1984 monograph, “CART: Classification and Regression Trees,” coauthored by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone (BFOS), represents a major milestone in the evolution of artificial intelligence, machine learning, nonparametric statistics, and data mining. The work is important for the comprehensiveness of its study of decision trees, the technical innovations it introduces, its sophisticated examples of tree-structured data analysis, and its authoritative treatment of large sample theory for trees. Since its publication the CART monograph has been cited some 3000 times according to the science and social science citation indexes; Google Scholar reports about 8,450 citations. CART citations can be found in almost any domain, with many appearing in fields such as credit risk, targeted marketing, financial markets modeling, electrical engineering, quality control, biology, chemistry, and clinical medical research. CART has also strongly influenced image compression …
2004
- (Raileanu & Stoffel, 2004) ⇒ Laura Elena Raileanu, and Kilian Stoffel. (2004). “Theoretical Comparison between the Gini Index and Information Gain Criteria.” In: Annals of Mathematics and Artificial Intelligence, 41(1). doi:10.1023/B:AMAI.0000018580.96245.c6
1984
- (Breiman et al., 1984) ⇒ Leo Breiman, Jerome H. Friedman, Charles J. Stone, and R. A. Olshen. (1984). “Classification and Regression Trees." Chapman & Hall. ISBN:0412048418
- Book Overview: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.