2006
DOI: 10.1080/17445760600567883
|View full text |Cite
|
Sign up to set email alerts
|

A new metric splitting criterion for decision trees

Abstract: We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
2

Year Published

2007
2007
2013
2013

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 7 publications
0
5
0
2
Order By: Relevance
“…We have shown in [14] that the conditional β-entropy enjoys the property specified next. Theorem 2.3 Let π, σ, σ ′ be three partitions of a finite set…”
Section: An Axiomatization Of Generalized Entropymentioning
confidence: 99%
See 1 more Smart Citation
“…We have shown in [14] that the conditional β-entropy enjoys the property specified next. Theorem 2.3 Let π, σ, σ ′ be three partitions of a finite set…”
Section: An Axiomatization Of Generalized Entropymentioning
confidence: 99%
“…These metrics are used for a variety of data mining tasks ranging from clustering [7,15] to classification [13,14] and discretization [10].…”
Section: Discussionmentioning
confidence: 99%
“…Before defining the distance between sparse context trees we introduce the notion of β-entropy of a tree τ . Following Simovici and Szymon (2006) we define, for all β > 0,…”
Section: A Metric Space Of Sparse Treesmentioning
confidence: 99%
“…Com a noção de entropia e a definição de partição máxima entre duas partições, deriva-se a definição de distância introduzida em (Simovici & Szymon, 2006). Essa será a distância que utilizaremos para estudar a similaridade entre as seqüências de proteínas.…”
Section: Um Espaço Métrico Deárvoresunclassified
“…Para issoé utilizada uma distância entre asárvores de contextos, introduzida em Simovici & Szymon (2006). O estudoé feito em seqüências de globinas e de fatores de crescimento de fibroblastos (FGF).…”
Section: Introductionunclassified