2016
DOI: 10.1587/transinf.2016edp7130
|View full text |Cite
|
Sign up to set email alerts
|

Cluster-Based Minority Over-Sampling for Imbalanced Datasets

Abstract: Kamthorn PUNTUMAPON†a) , Student Member, Thanawin RAKTHAMAMON †b) , Nonmember, and Kitsana WAIYAMAI †c) , Member SUMMARY Synthetic over-sampling is a well-known method to solve class imbalance by modifying class distribution and generating synthetic samples. A large number of synthetic over-sampling techniques have been proposed; however, most of them suffer from the over-generalization problem whereby synthetic minority class samples are generated into the majority class region. Learning from an over-generali… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Similarly, Seoane Santos et al [38] handled patients data by clustering the minority class instances and then rebalanced the data using SMOTE. Puntumapon et al [39] proposed a new method called TRIM, as a preprocessing stage before applying oversampling methods such as SMOTE or one of its extensions. Lim et al [40] implemented an evolutionary ensemble learning framework by clustering the minority class instances using mini-batch k-means and hierarchical agglomerative clustering before generating synthetic samples.…”
Section: Related Workmentioning
confidence: 99%
“…Similarly, Seoane Santos et al [38] handled patients data by clustering the minority class instances and then rebalanced the data using SMOTE. Puntumapon et al [39] proposed a new method called TRIM, as a preprocessing stage before applying oversampling methods such as SMOTE or one of its extensions. Lim et al [40] implemented an evolutionary ensemble learning framework by clustering the minority class instances using mini-batch k-means and hierarchical agglomerative clustering before generating synthetic samples.…”
Section: Related Workmentioning
confidence: 99%
“…Such an imbalance impacted the accuracy in that the algorithm could declare the defect-free to the defective data. To overcome this problem, the data processing techniques named over-sampling and under-sampling were introduced [38]. The over-sampling method could generate the artificial defect data along the line segment between other defective data.…”
Section: Resultsmentioning
confidence: 99%
“…TRIM is a method that aims to avoid overgeneralization of data. The basic idea is to identify the collection of minority data with the best compromise between generalizability and precision between data ( Puntumapon et al, 2016 ). Equation (1) is used to measure the precision and generalization of data.…”
Section: Methodsmentioning
confidence: 99%