2012
DOI: 10.1016/j.knosys.2011.06.013
|View full text |Cite
|
Sign up to set email alerts
|

On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
144
0
5

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 297 publications
(153 citation statements)
references
References 48 publications
4
144
0
5
Order By: Relevance
“…Garcia et al [22] applied four resampling algorithms and eight different classifiers on 17 real datasets. Authors' experiment showed that oversampling the minority class outperforms undersampling the majority class when datasets are strongly imbalanced and there are not significant differences for data with a low imbalance.…”
Section: B Oversamplingmentioning
confidence: 99%
“…Garcia et al [22] applied four resampling algorithms and eight different classifiers on 17 real datasets. Authors' experiment showed that oversampling the minority class outperforms undersampling the majority class when datasets are strongly imbalanced and there are not significant differences for data with a low imbalance.…”
Section: B Oversamplingmentioning
confidence: 99%
“…Chris Seiffertet al [13] have examined a new hybrid sampling/boosting algorithm, called RUS-Boost from its individual component AdaBoost and SMOTE-Boost, which is another algorithm that combines boosting and data sampling for learning from skewed training data. V. Garcia et al [14] have investigated the influence of both the imbalance ratio and the classifier on the performance of several resampling strategies to deal with imbalanced data sets. The study focuses on evaluating how learning is affected when different resampling algorithms transform the originally imbalanced data into artificially balanced class distributions.…”
Section: Literature Survey On Imbalance Datasetsmentioning
confidence: 99%
“…Fundamentals to employ ML for effective and improved SFP are consideration of different software metrics [4] - [7], [11] - [14], [29,30], Feature Selection (FS) [1,8,9], [15] - [17], [25], [28,34] and Data Balancing (DB) [9,18], [35] - [38]. For SFP, many software metrics have been proposed but we favor to separate the studies according to the most frequently used metrics: Chidamber and Kemerer's (CK) Object Oriented Metrics (OOM) and McCabe and Halstead Static Code Metrics (SCM) [11].…”
Section: Introductionmentioning
confidence: 99%
“…The other is to re-sample the original dataset, either by oversampling the minority class and/or under-sampling the majority class [18,35,36]. As shown in different studies [35] - [38], balancing data using Synthetic Minority Oversampling Techniques (SMOTE) gives better classification performance. Therefore, this study explores the algorithm of SMOTE for building SFP models with datasets that suffer from class imbalance.…”
Section: Introductionmentioning
confidence: 99%