Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Joloudari, Javad Hassannataj; Marefat, Abdolreza; Nematollahi, Mohammad Ali; Oyelere, Solomon Sunday; Hussain, Sadiq

doi:10.3390/app13064006

Cited by 52 publications

(15 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In cases of imbalanced learning problems, synthetic data augmentation to rebalance class distributions provide a meaningful benefit. This technique performs better than a simple oversampling approach in this study, but comparisons to other sampling methods [66][67][68] might be an interesting direction for further research. Although we could not confirm an overall improvement for a downstream classification task in this study, we did not cherry-pick model configurations where generative models exhibit sizable improvements over baseline models, but opted to give a comprehensive and robust outlook on the expected performance increase over many different scenarios instead.…”

Section: Discussionmentioning

confidence: 80%

A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data

Schaudt,

Späte,

von Schwerin

et al. 2023

Bioengineering

View full text Add to dashboard Cite

In medical imaging, deep learning models serve as invaluable tools for expediting diagnoses and aiding specialized medical professionals in making clinical decisions. However, effectively training deep learning models typically necessitates substantial quantities of high-quality data, a resource often lacking in numerous medical imaging scenarios. One way to overcome this deficiency is to artificially generate such images. Therefore, in this comparative study we train five generative models to artificially increase the amount of available data in such a scenario. This synthetic data approach is evaluated on a a downstream classification task, predicting four causes for pneumonia as well as healthy cases on 1082 chest X-ray images. Quantitative and medical assessments show that a Generative Adversarial Network (GAN)-based approach significantly outperforms more recent diffusion-based approaches on this limited dataset with better image quality and pathological plausibility. We show that better image quality surprisingly does not translate to improved classification performance by evaluating five different classification models and varying the amount of additional training data. Class-specific metrics like precision, recall, and F1-score show a substantial improvement by using synthetic images, emphasizing the data rebalancing effect of less frequent classes. However, overall performance does not improve for most models and configurations, except for a DreamBooth approach which shows a +0.52 improvement in overall accuracy. The large variance of performance impact in this study suggests a careful consideration of utilizing generative models for limited data scenarios, especially with an unexpected negative correlation between image quality and downstream classification improvement.

show abstract

Section: Discussionmentioning

confidence: 80%

A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data

Schaudt,

Späte,

von Schwerin

et al. 2023

Bioengineering

View full text Add to dashboard Cite

show abstract

“…This increases variants in the image space for minority classes during oversampling, while keeping the majority class largely as is. The large increase in image variants mimics synthetic creation, for example through interpolation 11 , 12 or GAN-based approaches 51 , 74 – 76 . Since the majority class is often not augmented in these methods, we use only weak augmentations to produce realistic looking images.…”

Section: Methodsmentioning

confidence: 99%

“…While the method is simple and can be applied to many domains, the repeated drawing of the same sample can lead to overfitting 49 . To counter this, more complex methods like SMOTE 11 , 12 or ADASYN 13 create synthetic samples of the minority class by interpolating between nearest neighbors. Generative adversarial networks (GANs) 50 have also been used to create synthetic samples to increase minority classes 14 , 51 .…”

Section: Related Workmentioning

confidence: 99%

Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset

Schaudt,

von Schwerin,

Hafner

et al. 2023

Sci Rep

View full text Add to dashboard Cite

Since the beginning of the COVID-19 pandemic, many different machine learning models have been developed to detect and verify COVID-19 pneumonia based on chest X-ray images. Although promising, binary models have only limited implications for medical treatment, whereas the prediction of disease severity suggests more suitable and specific treatment options. In this study, we publish severity scores for the 2358 COVID-19 positive images in the COVIDx8B dataset, creating one of the largest collections of publicly available COVID-19 severity data. Furthermore, we train and evaluate deep learning models on the newly created dataset to provide a first benchmark for the severity classification task. One of the main challenges of this dataset is the skewed class distribution, resulting in undesirable model performance for the most severe cases. We therefore propose and examine different augmentation strategies, specifically targeting majority and minority classes. Our augmentation strategies show significant improvements in precision and recall values for the rare and most severe cases. While the models might not yet fulfill medical requirements, they serve as an appropriate starting point for further research with the proposed dataset to optimize clinical resource allocation and treatment.

show abstract

“…An examination of the total dataset revealed 18,124 entries as normal and 959 as defective, representing a significant imbalance at a ratio of approximately 19:1. We used the Synthetic Minority Oversampling Technique (SMOTE) [29,30] to solve this imbalance problem. SMOTE generates new synthetic samples by utilizing the differences between the data points of the minority class, proving more effective than simple duplication in oversampling scenarios.…”

Section: Data Preprocessingmentioning

confidence: 99%

Advanced Anomaly Detection in Manufacturing Processes: Leveraging Feature Value Analysis for Normalizing Anomalous Data

Kim,

Seo,

Lee

2024

Electronics

View full text Add to dashboard Cite

In the realm of manufacturing processes, equipment failures can result in substantial financial losses and pose significant safety hazards. Consequently, prior research has primarily been focused on preemptively detecting anomalies before they manifest. However, within industrial contexts, the precise interpretation of predictive outcomes holds paramount importance. This has spurred the development of research in Explainable Artificial Intelligence (XAI) to elucidate the inner workings of predictive models. Previous studies have endeavored to furnish explanations for anomaly detection within these models. Nonetheless, rectifying these anomalies typically necessitates the expertise of seasoned professionals. Therefore, our study extends beyond the mere identification of anomaly causes; we also ascertain the specific adjustments required to normalize these deviations. In this paper, we present novel research avenues and introduce three methods to tackle this challenge. Each method has exhibited a remarkable success rate in normalizing detected errors, scoring 97.30%, 97.30%, and 100.0%, respectively. This research not only contributes to the field of anomaly detection but also amplifies the practical applicability of these models in industrial environments. It furnishes actionable insights for error correction, thereby enhancing their utility and efficacy in real-world scenarios.

show abstract

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Cited by 52 publications

References 77 publications

A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data

A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data

Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset

Advanced Anomaly Detection in Manufacturing Processes: Leveraging Feature Value Analysis for Normalizing Anomalous Data

Contact Info

Product

Resources

About