2002
DOI: 10.1613/jair.953
|View full text |Cite
|
Sign up to set email alerts
|

SMOTE: Synthetic Minority Over-sampling Technique

Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

28
11,916
5
290

Year Published

2008
2008
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 22,221 publications
(12,239 citation statements)
references
References 25 publications
28
11,916
5
290
Order By: Relevance
“…At the data level, various re-sampling techniques are applied to balance class distribution, including over-sampling minority class instances and under-sampling majority class instances [5], [6], [7], [8] . Particularly, SMOTE (Synthetic Minority Over-sampling Technique) [1] is a popular approach designed for generating new minority class data, which could expand decision boundary towards majority class. At the algorithm level, solutions are proposed by adjusting algorithm itself, including adjusting the costs of various classes to counter the class imbalance, adjusting the decision threshold, and recognition-based (i.e., learning from one class) rather than discrimination-based (two class) learning.…”
Section: Introduction I Mbalanced Data Sets (Ids) Correspond To Domentioning
confidence: 99%
See 1 more Smart Citation
“…At the data level, various re-sampling techniques are applied to balance class distribution, including over-sampling minority class instances and under-sampling majority class instances [5], [6], [7], [8] . Particularly, SMOTE (Synthetic Minority Over-sampling Technique) [1] is a popular approach designed for generating new minority class data, which could expand decision boundary towards majority class. At the algorithm level, solutions are proposed by adjusting algorithm itself, including adjusting the costs of various classes to counter the class imbalance, adjusting the decision threshold, and recognition-based (i.e., learning from one class) rather than discrimination-based (two class) learning.…”
Section: Introduction I Mbalanced Data Sets (Ids) Correspond To Domentioning
confidence: 99%
“…These methods operate by taking a base learning algorithm and invoking it many times with different training sets. Therefore, some algorithms are proposed based on these two ensemble models by changing their re-sampling methods, such as BEV (Bagging Ensemble Variation) [11], SMOTEBoost [1], and DataBoost [12]. More details will be introduced in the Section 2.…”
Section: Introduction I Mbalanced Data Sets (Ids) Correspond To Domentioning
confidence: 99%
“…To prevent deceitful results due to unbalanced classes, the Synthetic Minority Over-Sampling Technique (SMOTE) [21] was used to balance the classes. Once this was done, a set of machine learning algorithms was selected based on the state-ofthe-art tools used in the field, and experiments were conducted on these methods to determine which has the best performance in this specific case.…”
Section: Predictive Modelsmentioning
confidence: 99%
“…In this paper we address the problem of our imbalanced data in two ways: firstly by using data based sampling techniques [16,17] and secondly by using different SVM error costs for the two classes [18].…”
Section: Techniques For Learning Imbalanced Datasetsmentioning
confidence: 99%
“…Under sampling the majority class can be done by just randomly selecting a subset of the class. Over sampling the minority class is not so simple and here we use the Synthetic Minority Oversampling Technique (SMOTE) [16]. For each member of the minority class its nearest neighbours in the same class are identified and new instances are created, placed randomly between the instance and its neighbours.…”
Section: Sampling Techniquesmentioning
confidence: 99%