2023
DOI: 10.3390/s23042333
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning

Abstract: Nowadays, the solution to many practical problems relies on machine learning tools. However, compiling the appropriate training data set for real-world classification problems is challenging because collecting the right amount of data for each class is often difficult or even impossible. In such cases, we can easily face the problem of imbalanced learning. There are many methods in the literature for solving the imbalanced learning problem, so it has become a serious question how to compare the performance of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(19 citation statements)
references
References 44 publications
0
19
0
Order By: Relevance
“…However, to address the challenge associated with severely imbalanced datasets that could cause some folds not containing elements from all classes, the stratified cross‐validation method is used. This method preserves the percentage of samples from majority and minority classes by splitting the dataset on k folds 33 . The stratified five‐fold cross‐validation ensures that the proportion of instances (healthy and heart failure recordings) is preserved in each partition.…”
Section: Methodsmentioning
confidence: 99%
“…However, to address the challenge associated with severely imbalanced datasets that could cause some folds not containing elements from all classes, the stratified cross‐validation method is used. This method preserves the percentage of samples from majority and minority classes by splitting the dataset on k folds 33 . The stratified five‐fold cross‐validation ensures that the proportion of instances (healthy and heart failure recordings) is preserved in each partition.…”
Section: Methodsmentioning
confidence: 99%
“…Several classification model algorithms were built to discover neurophenotypes underlying AN. Models were scored by accuracy and area under the receiver operating curve (AUC ROC) using stratified cross-validation, which randomly samples from the minority and majority classes to the original distribution, so each fold has the same distribution as the initial distribution, achieving more robust validation (Szeghalmy & Fazekas, 2023). As our classes for group classification where imbalanced (HC = 27, AN = 65), which is known to cause bias (Teh, Armitage, Tesfaye, Selvarajah & Wilkinson 2020), the oversampling method assembled synthetic minority oversampling technique was used.…”
Section: Methodsmentioning
confidence: 99%
“…Subsequently, eight different classification methods, detailed in table 1, were utilized to construct prediction models. The stratified 5-fold cross-validation method [32,33] was adopted, where all features were incorporated into the feature selection algorithms. A set of features was chosen for each fold.…”
Section: Feature Selection and Classificationmentioning
confidence: 99%