The 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced In 2012
DOI: 10.1109/scis-isis.2012.6505291
|View full text |Cite
|
Sign up to set email alerts
|

A comparative study on sampling techniques for handling class imbalance in streaming data

Abstract: Sampling is the most popular approach for handling the class imbalance problem in training data. A number of studies have recently adapted sampling techniques for dynamic learning settings in which the training set is not fixed, but gradually grows over time. This paper presents an empirical study to compare over-sampling and under-sampling techniques in the context of data streaming. Experimental results show that undersampling performs better than over-sampling at smaller training set sizes. All sampling tec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(14 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…The following methods are employed for sampling process in this work: 1). Resample is a method develops a unique dataset, which is replaced with a sample [30].…”
Section: International Journal Of Recent Technology and Engineering (mentioning
confidence: 99%
“…The following methods are employed for sampling process in this work: 1). Resample is a method develops a unique dataset, which is replaced with a sample [30].…”
Section: International Journal Of Recent Technology and Engineering (mentioning
confidence: 99%
“…Oversampling 20,21 and undersampling 22 are often adopted to solve the problem of sample imbalance. Such methods usually copy the small samples till the number is same as that of large samples or reduce the number of large samples, which easily causes data redundancy or loss of partial information.…”
Section: Fine-tuning and Classification Based On Aos-softmaxmentioning
confidence: 99%
“…OU is also effective compared to (i) other undersampling and oversampling methods (Nguyen, Cooper, and Kamei 2012) and (ii) various types of bootstrap aggregation, boosting, and hybrid ensemble data rarity methods used in the data analytics literature (Galar, Fernández, Barrenechea, Bustince, and Herrera 2012). Nguyen et al (2012) suggests that OU improves performance not only because it makes the ratio of minority (fraud) to majority (non-fraud) observations more balanced, but also because it more efficiently incorporates potentially useful majority cases. By increasing the balance between fraud and non-fraud cases, OU adjusts the focus of the classification algorithms towards the fraud cases.…”
Section: Preprint Accepted Manuscript -7 -mentioning
confidence: 99%