2014 International Joint Conference on Neural Networks (IJCNN) 2014
DOI: 10.1109/ijcnn.2014.6889806
|View full text |Cite
|
Sign up to set email alerts
|

A novel application of Hoeffding's inequality to decision trees construction for data streams

Abstract: Decision trees are the commonly applied tools in the task of data stream classification. The most critical point in decision tree construction algorithm is the choice of the splitting attribute. In majority of algorithms existing in literature the splitting criterion is based on statistical bounds derived for split measure functions. In this paper we propose a totally new kind of splitting criterion. We derive statistical bounds for arguments of split measure function instead of deriving it for split measure f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…First, we compared the empirical behaviour of all our algorithms on the two-dimensional dataset banana, 8 in Figure 1. The simplicity of this dataset allows us to show visually the difference between the four algorithms.…”
Section: B Comparison Among Our Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…First, we compared the empirical behaviour of all our algorithms on the two-dimensional dataset banana, 8 in Figure 1. The simplicity of this dataset allows us to show visually the difference between the four algorithms.…”
Section: B Comparison Among Our Methodsmentioning
confidence: 99%
“…Incremental decision and rule tree learning systems, such as Very Fast Decision Tree (VFDT) [7] and Decision Rules (RULES) [12] which use an incremental version of the split function computation -see also [22], [19], [8], [4].…”
Section: Related Workmentioning
confidence: 99%
“…We ran experiments on synthetic datasets and popular benchmarks, comparing our C-Tree (Algorithm 1) against two baselines: H-Tree (VDFT algorithm [7]) and CorrH-Tree (the method from [8] using the classification error as splitting criterion). The bounds of [28] are not considered because of their conservativeness.…”
Section: Full Sampling Experimentsmentioning
confidence: 99%
“…Alternative approaches, such as NIP-H e NIP-N, use Gaussian approximations instead of Hoeffding bounds in order to compute confidence intervals. Several extensions of VFDT have been proposed, also taking into account non-stationary data sources -see, e.g., [10], [9], [2], [35], [27], [15], [19], [21], [11], [34], [20], [29], [8]. All these methods are based on the classical Hoeffding bound [14]: after m independent observations of a random variable taking values in a real interval of size R, with probability at least 1 − δ the true mean does not differ from the sample mean by more than…”
Section: Introductionmentioning
confidence: 99%
“…This problem has become particularly important in re-cent years as the number of collected data has increased. Therefore, the researchers paid special attention to a field of artificial intelligence called data streams mining (DSM) [1][2][3][4][5][6][7][8][9][10][11][12][13]. In the data stream scenario, instead of the static training set, we assume that the data come to the system continuously, one after the other.…”
Section: Introductionmentioning
confidence: 99%