2003
DOI: 10.1007/978-3-540-39804-2_12
|View full text |Cite
|
Sign up to set email alerts
|

SMOTEBoost: Improving Prediction of the Minority Class in Boosting

Abstract: Abstract. Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
701
1
5

Year Published

2012
2012
2018
2018

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 1,192 publications
(708 citation statements)
references
References 14 publications
1
701
1
5
Order By: Relevance
“…For example, Chawla et al (2003) and Chen et al (2004) found that imbalance between the proportion of presence and absence classes can cause bias in the prediction and model-fit. They found that when an imbalanced sample is present, the bootstrap of the data is biased towards the majority class, thus over-predicting the majority-class and under-predicting the minority.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Chawla et al (2003) and Chen et al (2004) found that imbalance between the proportion of presence and absence classes can cause bias in the prediction and model-fit. They found that when an imbalanced sample is present, the bootstrap of the data is biased towards the majority class, thus over-predicting the majority-class and under-predicting the minority.…”
Section: Introductionmentioning
confidence: 99%
“…The model was evaluated using F-measure, G-mean and Accuracy using seventeen imbalanced datasets including: Ionosphere, Hepatitis, Abalone, Yeast, Oil spills and Breast Cancer datasets. For each datasets the model was compared with C4.5, AdaBoostM1, DataBoost, CSB2, AdaCost [30] and SMOTEBoost [14]. The proposed model scored high on highly imbalanced datasets in terms of the F-measure and is comparable (in some instances higher) with other models when it comes to G-mean and Accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…Bagging [10] and boosting [30] are two popular methods for building ensembles of classifiers with a rich history of extensions [17,31,39,61,74,78]. In this section we outline various approaches which have been taken to make bagging and boosting methods overcome concept drift.…”
Section: Bagging and Boosting Based Methodsmentioning
confidence: 99%