2022
DOI: 10.3390/axioms11110607
|View full text |Cite
|
Sign up to set email alerts
|

A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models

Abstract: Machine learning models may not be able to effectively learn and predict from imbalanced data in the fields of machine learning and data mining. This study proposed a method for analyzing the performance impact of imbalanced binary data on machine learning models. It systematically analyzes 1. the relationship between varying performance in machine learning models and imbalance rate (IR); 2. the performance stability of machine learning models on imbalanced binary data. In the proposed method, the imbalanced d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
3

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 34 publications
0
4
0
3
Order By: Relevance
“…Ketidakseimbangan kelas terjadi ketika jumlah sampel dalam satu kelas jauh lebih sedikit daripada jumlah sampel dalam kelas lainnya. Hal ini dapat mengakibatkan ketidakefisienan model pembelajaran mesin dalam menghadapi data yang tidak seimbang [16]. Pada eksperimen ini, dataset mengalami ketidakseimbangan karena jumlah sampel kelas jalan rusak hanya 971, sementara jumlah sampel kelas jalan tidak rusak sebanyak 1226. dipilih sebagai salah satu metode OBIA dengan memvariasikan jumlah cluster sebelum diklasifikasikan menggunakan CNN.…”
Section: Nilai Maksimum Clusterunclassified
“…Ketidakseimbangan kelas terjadi ketika jumlah sampel dalam satu kelas jauh lebih sedikit daripada jumlah sampel dalam kelas lainnya. Hal ini dapat mengakibatkan ketidakefisienan model pembelajaran mesin dalam menghadapi data yang tidak seimbang [16]. Pada eksperimen ini, dataset mengalami ketidakseimbangan karena jumlah sampel kelas jalan rusak hanya 971, sementara jumlah sampel kelas jalan tidak rusak sebanyak 1226. dipilih sebagai salah satu metode OBIA dengan memvariasikan jumlah cluster sebelum diklasifikasikan menggunakan CNN.…”
Section: Nilai Maksimum Clusterunclassified
“…Data level-based techniques (resampling techniques) are classified as oversampling, undersampling, or a hybrid of oversampling and undersampling techniques [38]. On the other hand, the algorithm-based techniques for imbalanced datasets address the imbalance issue by allowing for the assignment of different class weights to penalize misclassification of the minority class more heavily [39]. Ensemble learning methods represent another category of algorithmic approaches used to address the issue of class imbalance.…”
Section: Addressing the Imbalance Issuementioning
confidence: 99%
“…In the case of an imbalanced dataset, this is inappropriate [31]. Zheng et al [32] examined the effects of imbalanced data on machine learning models by evaluating eight well-known machine learning models on 48 different imbalanced datasets. The results demonstrate that, as the imbalance rate increases, the classification accuracy of the algorithms decreases.…”
Section: Investigates the Effects Of Data Balancing On Classification...mentioning
confidence: 99%