2018
DOI: 10.1613/jair.1.11192
|View full text |Cite
|
Sign up to set email alerts
|

SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

Abstract: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also sig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
657
0
16

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,383 publications
(673 citation statements)
references
References 214 publications
0
657
0
16
Order By: Relevance
“…(c) How to select suitable attributes and determine their weights is another problem awaiting resolution. (d) The empirical studies of the three‐way decision module could be conducted, and some approaches of dealing with imbalanced classification may be utilized (Branco, Torgo, & Ribeiro, ; Fernández, Garcia, Herrera, & Chawla, ; Torgo, Ribeiro, Pfahringer, & Branco, ). The abovementioned future works can help to enrich our knowledge of applying advanced researches to practical problems.…”
Section: Resultsmentioning
confidence: 99%
“…(c) How to select suitable attributes and determine their weights is another problem awaiting resolution. (d) The empirical studies of the three‐way decision module could be conducted, and some approaches of dealing with imbalanced classification may be utilized (Branco, Torgo, & Ribeiro, ; Fernández, Garcia, Herrera, & Chawla, ; Torgo, Ribeiro, Pfahringer, & Branco, ). The abovementioned future works can help to enrich our knowledge of applying advanced researches to practical problems.…”
Section: Resultsmentioning
confidence: 99%
“…• Now, in the dataset, the number of data points for one class is very less when compared to the other class corresponding to the COVID-19 patients. Therefore, to balance the imbalanced data points, we use synthetic minority oversampling technique (SMOTE) 23,24 . This algorithm creates an equal number of samples for each class.…”
Section: /10mentioning
confidence: 99%
“…It generates artificial samples from the minority class by interpolating existing instances that lie close together. Nowadays, it is one the most popular data sampling methods [1] and it has motivated the development of other over-samplings algorithms [24]. Similarly, under-sampling methods have been incorporate a heuristic component [25], some of the most outstanding examples being the Tomek's Links (TL) [26], Editing Nearest Neighbor (ENN) [27], and Condensed Nearest Neighbor rule (CNN) [28], among others [22,[29][30][31][32][33].…”
Section: Introductionmentioning
confidence: 99%
“…It has seen that big data class imbalance approaches have been addressed by adaptation of traditional techniques, mainly sampling methods [21,44]. However, recent studies show that some conclusions from machine learning are not applicable to the big data context; for example, in machine learning is common that SMOTE performs better than ROS [24], but in big data some results do not show this trend [1,48]. In addition, only a few works have been addressed to deal with the class imbalance in big data by using "intelligent" or heuristic sampling techniques [17,49].…”
Section: Introductionmentioning
confidence: 99%