2024
DOI: 10.1038/s41598-024-55598-1
|View full text |Cite
|
Sign up to set email alerts
|

A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

Amir Reza Salehi,
Majid Khedmati

Abstract: In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and rand… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 44 publications
(47 reference statements)
0
1
0
Order By: Relevance
“…To address this challenge, the field of ML has developed advanced strategies [ 39 , 40 ]. Class balancing techniques, such as random sampling [ 41 ], synthetic minority oversampling technique (SMOTE) [ 42 , 43 ], and other methodologies [ 44 ], have shown promise in reducing data disparities through improved ML model training [ 45 ]. For instance, these approaches have already been used in a variety of research contexts, including medical diagnosis, gait and image analysis [ 46 , 47 , 48 , 49 , 50 ], and have shown significant improvements in disease detection and clinical outcome prediction [ 51 , 52 ].…”
Section: Introductionmentioning
confidence: 99%
“…To address this challenge, the field of ML has developed advanced strategies [ 39 , 40 ]. Class balancing techniques, such as random sampling [ 41 ], synthetic minority oversampling technique (SMOTE) [ 42 , 43 ], and other methodologies [ 44 ], have shown promise in reducing data disparities through improved ML model training [ 45 ]. For instance, these approaches have already been used in a variety of research contexts, including medical diagnosis, gait and image analysis [ 46 , 47 , 48 , 49 , 50 ], and have shown significant improvements in disease detection and clinical outcome prediction [ 51 , 52 ].…”
Section: Introductionmentioning
confidence: 99%