Unbiased split selection for classification trees based on the Gini Index
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion.
Year of publication: |
2005
|
---|---|
Authors: | Strobl, Carolin ; Boulesteix, Anne-Laure ; Augustin, Thomas |
Publisher: |
München : Ludwig-Maximilians-Universität München, Sonderforschungsbereich 386 - Statistische Analyse diskreter Strukturen |
Saved in:
freely available
Series: | Discussion Paper ; 464 |
---|---|
Type of publication: | Book / Working Paper |
Type of publication (narrower categories): | Working Paper |
Language: | English |
Other identifiers: | 10.5282/ubm/epub.1833 [DOI] 510826199 [GVK] hdl:10419/31118 [Handle] |
Source: |
Persistent link: https://www.econbiz.de/10010266219
Saved in favorites
Similar items by person
-
Unbiased split selection for classification trees based on the Gini Index
Strobl, Carolin, (2005)
-
Unbiased split selection for classification trees based on the Gini Index
Strobl, Carolin, (2007)
-
Bernau, Christoph, (2013)
- More ...