Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them

Bifet, Albert; Read, Jesse; Žliobaitė, Indrė; Pfahringer, Bernhard; Holmes, Geoff

doi:10.1007/978-3-642-40988-2_30

Cited by 61 publications

(52 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…0 implies that the proposed model's predictions increasingly coincide with that of the naive model. Under classification tasks Bifet et al [21] note that assuming, say, a Naive Bayes classifier or even a simple majority class 'coin toss', would be informative for the case of data conforming to the i.i.d. assumption.…”

Section: Discussionmentioning

confidence: 99%

Evolutionary model building under streaming data for classification tasks: opportunities and challenges

Heywood

2014

Genet Program Evolvable Mach

View full text Add to dashboard Cite

Streaming data analysis potentially represents a significant shift in emphasis from schemes historically pursued for offline (batch) approaches to the classification task. In particular, a streaming data application implies that: (1) the data itself has no formal 'start' or 'end'; (2) the properties of the process generating the data are non-stationary, thus models that function correctly for some part(s) of a stream may be ineffective elsewhere; (3) constraints on the time to produce a response, potentially implying an anytime operational requirement; and (4) given the prohibitive cost of employing an oracle to label a stream, a finite labelling budget is necessary. The scope of this article is to provide a survey of developments for model building under streaming environments from the perspective of both evolutionary and non-evolutionary frameworks. In doing so, we bring attention to the challenges and opportunities that developing solutions to streaming data classification tasks are likely to face using evolutionary approaches.

show abstract

Section: Discussionmentioning

confidence: 99%

Evolutionary model building under streaming data for classification tasks: opportunities and challenges

Heywood

2014

Genet Program Evolvable Mach

View full text Add to dashboard Cite

show abstract

“…To further bridge the connections between our detection results and clustering results in [11], a recently developed measurement -Kappa Plus Statistic (KPS) [2,23] -have been proposed. KPS, defined as κ + = p0−p e 1−p e , aims to evaluate data stream classifier performance taken into account temporal dependence as well as the effectiveness (or rationality) of classifier adaptation, where p 0 is the classifier's prequential accuracy and p e is the accuracy of No-Change classifier.…”

confidence: 99%

Concept Drift Detection with Hierarchical Hypothesis Testing

Yu¹,

Abraham²

2017

Proceedings of the 2017 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

When using statistical models (such as a classifier) in a streaming environment, there is often a need to detect and adapt to concept drifts to mitigate any deterioration in the model's predictive performance over time. Unfortunately, the ability of popular concept drift approaches in detecting these drifts in the relationship of the response and predictor variable is often dependent on the distribution characteristics of the data streams, as well as its sensitivity on parameter tuning. This paper presents Hierarchical Linear Four Rates (HLFR), a framework that detects concept drifts for different data stream distributions (including imbalanced data) by leveraging a hierarchical set of hypothesis tests in an online setting. The performance of HLFR is compared to benchmark approaches using both simulated and real-world datasets spanning the breadth of concept drift types. HLFR significantly outperforms benchmark approaches in terms of accuracy, G-mean, recall, delay in detection and adaptability across the various datasets.

show abstract

“…Evaluation is from a scientific point of view not less but more important than the design of ever newer and "better" (really?) methods, and is typically challenging and far from trivial, as has been discussed for other areas of data mining as well [1,2,7,9,20,22].…”

Section: So What?mentioning

confidence: 99%

Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets

Basaran

Ntoutsi

Zimek³

2017

Proceedings of the 2017 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

A collection of datasets crawled from Amazon, "Amazon reviews", is popular in the evaluation of recommendation systems. These datasets, however, contain redundancies (duplicated recommendations for variants of certain items). These redundancies went unnoticed in earlier use of these datasets and thus incurred to a certain extent wrong conclusions in the evaluation of algorithms tested on these datasets. We analyze the nature and amount of these redundancies and their impact on the evaluation of recommendation methods. While the general and obvious conclusion is that redundancies should be avoided and datasets should be carefully preprocessed, we observe more specifically that their impact depends on the complexity of the methods. With this work, we also want to raise the awareness of the importance of data quality, model understanding, and appropriate evaluation.

show abstract

Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them

Cited by 61 publications

References 20 publications

Evolutionary model building under streaming data for classification tasks: opportunities and challenges

Evolutionary model building under streaming data for classification tasks: opportunities and challenges

Concept Drift Detection with Hierarchical Hypothesis Testing

Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets

Contact Info

Product

Resources

About