2004 TheoreticalComparBetGiniAndIGain

From GM-RKB
Jump to navigation Jump to search

Subject Headings Impurity Function, Gini Impurity Index, Information Gain Criteria.

Notes

Cited By

Quotes

Abstract

Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selection of the attribute used at each node of the tree to split the data (split criterion) is crucial in order to correctly classify objects. Different split criteria were proposed in the literature (Information Gain, Gini Index, etc.). It is not obvious which of them will produce the best decision tree for a given data set. A large amount of empirical tests were conducted in order to answer this question. No conclusive results were found. In this paper we introduce a formal methodology, which allows us to compare multiple split criteria. This permits us to present fundamental insights into the decision process. Furthermore, we are able to present a formal description of how to select between split criteria for a given data set. As an illustration we apply the methodology to two widely used split criteria: Gini Index and Information Gain.


References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 TheoreticalComparBetGiniAndIGainTheoretical Comparison between the Gini Index and Information Gain Criteriahttp://www2.unine.ch/files/content/sites/imi/files/shared/documents/papers/Gini index fulltext.pdf10.1023/B:AMAI.0000018580.96245.c6