2021
DOI: 10.1109/access.2021.3102399
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data

Abstract: Medical datasets are usually imbalanced, where negative cases severely outnumber p osit iv e cases. Therefore, it is essential to deal with this data skew problem when training machine learning algorithms. This study uses two representative lung cancer datasets, PLCO an d NLST, wit h imb alan ce ratios (the proportion of samples in the majority class to those in the minority class) of 24.7 and 25.0, respectively, to predict lung cancer incidence. This research uses the performance o f 23 clas s imb alan ce met… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
81
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 167 publications
(83 citation statements)
references
References 82 publications
0
81
2
Order By: Relevance
“…In this study, the following models were utilised for the performance measurement of institutions. Machine learning models are also widely used in the domain of healthcare [25][26][27], robotics [28,29], and business [30,31].…”
Section: Modellingmentioning
confidence: 99%
See 1 more Smart Citation
“…In this study, the following models were utilised for the performance measurement of institutions. Machine learning models are also widely used in the domain of healthcare [25][26][27], robotics [28,29], and business [30,31].…”
Section: Modellingmentioning
confidence: 99%
“…o f correct predictions Total No. o f predictions (25) For binary classification, the accuracy is measured using Equation (25) or Equation (26):…”
Section: Accuracy =mentioning
confidence: 99%
“…Although the IHT method helps resolve class overlap, it heavily depends on the performance of a single classifier for identifying IHP (Khushi et al, 2021). Employing a poorly-performing base classifier for identifying IHP tends to eliminate a significant number of negative samples, leading to information loss.…”
Section: A Learning From Imbalanced Datamentioning
confidence: 99%
“…Alam [21] proposed a new model specified for imbalanced datasets of credit card default prediction. Khushi utilize the testing results of 20+ class imbalance models with three types of classifiers to detect the best imbalance techniques for medical datasets [22]. Some works explore the risk factors in machine learning models that influence the class identification in an imbalanced dataset [23][24][25].…”
Section: Related Work 21 Adversarial Neural Networkmentioning
confidence: 99%