2023
DOI: 10.3390/info14020092
|View full text |Cite
|
Sign up to set email alerts
|

Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods

Abstract: The existence of missing values reduces the amount of knowledge learned by the machine learning models in the training stage thus affecting the classification accuracy negatively. To address this challenge, we introduce the use of Support Vector Machine (SVM) regression for imputing the missing values. Additionally, we propose a two-level classification process to reduce the number of false classifications. Our evaluation of the proposed method was conducted using the PIMA Indian dataset for diabetes classific… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(7 citation statements)
references
References 46 publications
0
7
0
Order By: Relevance
“…( 1 ) and ( 2 ), convert the values of these attributes to the numerical values. Handling Missing or Null Values Missing or null values are values that may lead to incorrect predictions or inference for each class in the classification [ 29 , 30 ], so here, a little number of attribute values were completing (missing values) with the consultation of a doctor specializing in endocrinology and metabolism using the k-NN Imputation method. The result of this process is shown in Fig.…”
Section: Methodsmentioning
confidence: 99%
“…( 1 ) and ( 2 ), convert the values of these attributes to the numerical values. Handling Missing or Null Values Missing or null values are values that may lead to incorrect predictions or inference for each class in the classification [ 29 , 30 ], so here, a little number of attribute values were completing (missing values) with the consultation of a doctor specializing in endocrinology and metabolism using the k-NN Imputation method. The result of this process is shown in Fig.…”
Section: Methodsmentioning
confidence: 99%
“…No explicit solution exists to minimize (4); an iterative algorithm Expectation Maximization-PCA (EM-PCA) 25 is proposed to solve it. The EM-PCA algorithm is also called iterative PCA, and the missing values are generated as a fixed structure having a low-rank representation in 𝑆 dimensions corrupted by noise as (5).…”
Section: Pca Imputes Missing Valuesmentioning
confidence: 99%
“…Although there are now larger, more complex diabetes datasets, the Pima Indian Diabetes dataset has remained a benchmark for diabetes classification research. As the Pima Indian Diabetes dataset is challenging to classify due to its missing values, much research has been done on the Pima Indian Diabetes dataset missing values imputation to improve the classification accuracy, such as using the fuzzy feature selection method 4 , Support Vector Machine (SVM) method 5 , and tree-based method to impute the missing values 6 . In this literature, we compared four machine learning missing value imputation models (Clustering, K Nearest Neighbors, Markov Chain Monte Carlo, and Principal Component Analysis) to find the lowest imputation test-validation mean square error with its optimal tuning parameter in the Pima Indian Diabetes dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Properly addressing missing data ensures the robustness and validity of research findings, enabling accurate interpretation and generalization of results. Techniques to manage missing values range from simple imputation to more complex statistical models, each carrying different implications for the integrity of the dataset and the reliability of subsequent findings [ 6 , 7 , 8 ].…”
Section: Introductionmentioning
confidence: 99%