2013
DOI: 10.1007/978-3-642-40988-2_30
|View full text |Cite
|
Sign up to set email alerts
|

Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them

Abstract: Abstract. Data stream classification plays an important role in modern data analysis, where data arrives in a stream and needs to be mined in real time. In the data stream setting the underlying distribution from which this data comes may be changing and evolving, and so classifiers that can update themselves during operation are becoming the state-of-the-art. In this paper we show that data streams may have an important temporal component, which currently is not considered in the evaluation and benchmarking o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0
8

Year Published

2014
2014
2022
2022

Publication Types

Select...
8
1
1

Relationship

2
8

Authors

Journals

citations
Cited by 61 publications
(52 citation statements)
references
References 20 publications
0
44
0
8
Order By: Relevance
“…0 implies that the proposed model's predictions increasingly coincide with that of the naive model. Under classification tasks Bifet et al [21] note that assuming, say, a Naive Bayes classifier or even a simple majority class 'coin toss', would be informative for the case of data conforming to the i.i.d. assumption.…”
Section: Discussionmentioning
confidence: 99%
“…0 implies that the proposed model's predictions increasingly coincide with that of the naive model. Under classification tasks Bifet et al [21] note that assuming, say, a Naive Bayes classifier or even a simple majority class 'coin toss', would be informative for the case of data conforming to the i.i.d. assumption.…”
Section: Discussionmentioning
confidence: 99%
“…To further bridge the connections between our detection results and clustering results in [11], a recently developed measurement -Kappa Plus Statistic (KPS) [2,23] -have been proposed. KPS, defined as κ + = p0−p e 1−p e , aims to evaluate data stream classifier performance taken into account temporal dependence as well as the effectiveness (or rationality) of classifier adaptation, where p 0 is the classifier's prequential accuracy and p e is the accuracy of No-Change classifier.…”
Section: Copyright © By Siammentioning
confidence: 99%
“…Evaluation is from a scientific point of view not less but more important than the design of ever newer and "better" (really?) methods, and is typically challenging and far from trivial, as has been discussed for other areas of data mining as well [1,2,7,9,20,22].…”
Section: So What?mentioning
confidence: 99%