2020
DOI: 10.22266/ijies2020.0229.31
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of Feature Selection Using Genetic Algorithm in Naïve Bayes Classification for Incomplete Data

Abstract: In the case of high dimensional data with missing values, the process of collecting data from various sources may be miss accidentally, which affected the quality of learning outcomes. a large number of machine learning methods can be applied to explore the search area for imputation and selection of features and parameters. ML classification needs preprocessing with self-organizing map imputation (SOMI) before the imputation of missing values is done to improve the accuracy of the model. This study introduces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 27 publications
0
13
0
1
Order By: Relevance
“…This is due to many reasons, which can be summarized as follows; (1) NB can provide fast predictions rather than other classification algorithms because the training time has an order O(N) with the dataset, (2) it can be easily trained with small amount of input training dataset and it can be used also for large datasets as well, (3) the simplicity and easy implementation with the ability of real-time training for new items, (4) the implementation of this classifier has no required adjusting parameters or domain knowledge, (5) It handles both continuous and discrete data, (6) NB is less sensitive to missing data, (7) NB has high capability to handle the noise in the dataset, (8) NB is an Incremental learning approach because its functions work from an approximation of low-order probabilities which are extracted from the training data. Hence, these can be quickly updated as new training data are obtained, (9) If the Naive Bayes conditional independence assumption holds, then it will converge quicker than discriminative models like logistic regression, (10) NB can be used for both binary and multiclass classification problems and (11) NB is sufficient for real-time applications such as diseases diagnoses because it relies on a set of pre-computed probabilities that make the classification done in a very short time (Khotimah et al 2020;Kaur and Oberoi 2020), Although NB has proven efficiency with real-time applications, its performance is sometimes thumping in many cases because of the unrealistic assumption that all features have the same degree of importance and are independent of the given class value. Hence, this unrealistic assumption should be mitigated to overcome such hurdles.…”
Section: Introductionmentioning
confidence: 99%
“…This is due to many reasons, which can be summarized as follows; (1) NB can provide fast predictions rather than other classification algorithms because the training time has an order O(N) with the dataset, (2) it can be easily trained with small amount of input training dataset and it can be used also for large datasets as well, (3) the simplicity and easy implementation with the ability of real-time training for new items, (4) the implementation of this classifier has no required adjusting parameters or domain knowledge, (5) It handles both continuous and discrete data, (6) NB is less sensitive to missing data, (7) NB has high capability to handle the noise in the dataset, (8) NB is an Incremental learning approach because its functions work from an approximation of low-order probabilities which are extracted from the training data. Hence, these can be quickly updated as new training data are obtained, (9) If the Naive Bayes conditional independence assumption holds, then it will converge quicker than discriminative models like logistic regression, (10) NB can be used for both binary and multiclass classification problems and (11) NB is sufficient for real-time applications such as diseases diagnoses because it relies on a set of pre-computed probabilities that make the classification done in a very short time (Khotimah et al 2020;Kaur and Oberoi 2020), Although NB has proven efficiency with real-time applications, its performance is sometimes thumping in many cases because of the unrealistic assumption that all features have the same degree of importance and are independent of the given class value. Hence, this unrealistic assumption should be mitigated to overcome such hurdles.…”
Section: Introductionmentioning
confidence: 99%
“…This algorithm is a statistical prediction model that predicts output variables using the input in a way that none of the input variables affect each other. On the other hand, no combinational input variables have strength for determining the probability of occurring the output variable (28,29). MLP: This algorithm consists of computational units known as neurons that exist in the input, hidden, and out layers of the artificial neural network (ANN).…”
Section: Model Development and Assessmentmentioning
confidence: 99%
“…Pada proses evolusi, sejumlah gen penyusun kromosom akan mengalami proses persilangan dan mutasi. Genetic Algorithm menggunakan transisi probabilistik untuk memilih kromosom terbaik untuk mendapatkan solusi yang optimal [23].…”
Section: Genetic Algorithmunclassified