2017
DOI: 10.1155/2017/1827016
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

Abstract: Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the samples near the decision boundary which contain more discriminative information should be valued and the skew of the bo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 117 publications
(50 citation statements)
references
References 35 publications
0
50
0
Order By: Relevance
“…Low occurrence rates in relatively small datasets lead to large class-imbalances that are a significant challenge in medical machine learning. 22,23 To this end, we have trained several supervised machine learning classifiers to predict the probability of postoperative complications in a relatively small dataset (<15,000 patients) that can accurately learn complications with relatively low occurrence rates (<1%). We have rigorously developed and tested our models by employing the best practices in machine learning in this study by performing automated feature selection, L2 regularization, testing on blinded hold-out data sets, and comparing to a standard risk-scoring system to ensure a high standard that is necessary for implementation of machine learning in clinical settings.…”
Section: Discussionmentioning
confidence: 99%
“…Low occurrence rates in relatively small datasets lead to large class-imbalances that are a significant challenge in medical machine learning. 22,23 To this end, we have trained several supervised machine learning classifiers to predict the probability of postoperative complications in a relatively small dataset (<15,000 patients) that can accurately learn complications with relatively low occurrence rates (<1%). We have rigorously developed and tested our models by employing the best practices in machine learning in this study by performing automated feature selection, L2 regularization, testing on blinded hold-out data sets, and comparing to a standard risk-scoring system to ensure a high standard that is necessary for implementation of machine learning in clinical settings.…”
Section: Discussionmentioning
confidence: 99%
“…A thing to take note when using supervised method for training is imbalanced data: The predictive models developed using conventional machine learning algorithms could be biased and inaccurate because the number of observations in one class of the dataset is significantly lower than the other. To handle imbalanced data, several methods can be used, including resampling, boosting, bagging [17][18][19][20].…”
Section: Supervised Modelmentioning
confidence: 99%
“…ADASYN [He, Bai, Garcia et al (2008)] is an important improvement of SMOTE, which generates the synthetic examples by the proportion of the majority ratio. SVM-SMOTE [Nguyen, Cooper and Kamei (2011);Wang, Luo, Huang et al (2017)] generates artificial support vectors by SMOTE and gets good experimental results. Although these algorithms have different generating tricks, the core generating method is still the selected line segment way.…”
Section: Related Workmentioning
confidence: 99%