2023
DOI: 10.3390/app13064006
|View full text |Cite
|
Sign up to set email alerts
|

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Abstract: Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models from achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models’ learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(15 citation statements)
references
References 77 publications
0
15
0
Order By: Relevance
“…In cases of imbalanced learning problems, synthetic data augmentation to rebalance class distributions provide a meaningful benefit. This technique performs better than a simple oversampling approach in this study, but comparisons to other sampling methods [66][67][68] might be an interesting direction for further research. Although we could not confirm an overall improvement for a downstream classification task in this study, we did not cherry-pick model configurations where generative models exhibit sizable improvements over baseline models, but opted to give a comprehensive and robust outlook on the expected performance increase over many different scenarios instead.…”
Section: Discussionmentioning
confidence: 80%
“…In cases of imbalanced learning problems, synthetic data augmentation to rebalance class distributions provide a meaningful benefit. This technique performs better than a simple oversampling approach in this study, but comparisons to other sampling methods [66][67][68] might be an interesting direction for further research. Although we could not confirm an overall improvement for a downstream classification task in this study, we did not cherry-pick model configurations where generative models exhibit sizable improvements over baseline models, but opted to give a comprehensive and robust outlook on the expected performance increase over many different scenarios instead.…”
Section: Discussionmentioning
confidence: 80%
“…This increases variants in the image space for minority classes during oversampling, while keeping the majority class largely as is. The large increase in image variants mimics synthetic creation, for example through interpolation 11 , 12 or GAN-based approaches 51 , 74 – 76 . Since the majority class is often not augmented in these methods, we use only weak augmentations to produce realistic looking images.…”
Section: Methodsmentioning
confidence: 99%
“…While the method is simple and can be applied to many domains, the repeated drawing of the same sample can lead to overfitting 49 . To counter this, more complex methods like SMOTE 11 , 12 or ADASYN 13 create synthetic samples of the minority class by interpolating between nearest neighbors. Generative adversarial networks (GANs) 50 have also been used to create synthetic samples to increase minority classes 14 , 51 .…”
Section: Related Workmentioning
confidence: 99%
“…An examination of the total dataset revealed 18,124 entries as normal and 959 as defective, representing a significant imbalance at a ratio of approximately 19:1. We used the Synthetic Minority Oversampling Technique (SMOTE) [29,30] to solve this imbalance problem. SMOTE generates new synthetic samples by utilizing the differences between the data points of the minority class, proving more effective than simple duplication in oversampling scenarios.…”
Section: Data Preprocessingmentioning
confidence: 99%