CHAID Algorithm
(Redirected from CHAID algorithm)
Jump to navigation
Jump to search
A CHAID Algorithm is a decision tree training algorithm that uses a Chi-Square impurity function as a decision tree splitting criterion.
- AKA: CHAID, Chi Square Automatic Interation Detection.
- Context
- It can make use of the AID heuristics to also handle numerical data.
- …
- Counter-Example(s):
- a CART Algorithm.
- a C4.5 Algorithm.
- See: Decision Tree Splitting Criterion, Impurity Function, Chi-Square Distribution.
References
2011
- http://en.wikipedia.org/wiki/CHAID
- CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detector, based upon a formal extension of the US AID (Automatic Interaction Detector) and THAID (THeta Automatic Interaction Detector) procedures of the 1960s and 70's, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.
2004
- (Madigson & Vermunt, 2004) ⇒ J. Magidson, and J. K. Vermunt. (2004). “An Extension of the CHAID Tree-based Segmentation Algorithm to Multiple Dependent Variables.” In: Proceedings of the 28th Annual Conference of the Gesellschaft f{\\"u}r Klassifikation eV, University of Dortmund
1999
- (Zaiane, 1999) ⇒ Osmar Zaiane. (1999). “Glossary of Data Mining Terms." University of Alberta, Computing Science CMPUT-690: Principles of Knowledge Discovery in Databases.
- QUOTE: [[CHAID Algorithm|CHAID]: Chi Square Automatic Interaction Detection. A decision tree technique used for classification of a dataset. Provides a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. Segments a dataset by using chi square tests to create multi-way splits. Preceded, and requires more data preparation than, CART.