2020
DOI: 10.1007/s42452-020-3129-x
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid model for class noise detection using k-means and classification filtering algorithms

Abstract: Real data may have a considerable amount of noise produced by error in data collection, transmission and storage. The noisy training data set increases the training time and complexity of the induced machine learning model, which led to reduce the overall performance. Identifying noisy instances and then eliminating or correcting them are useful techniques in data mining research. This paper investigates misclassified instances issues and proposes a clustering-based and classification filtering algorithm (CLCF… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…The noisy training data set increases the training time and complexity of the model. Consequently, identifying noisy instances and then eliminating or correcting them are useful techniques in data mining research ( Nematzadeh et al, 2020 ). Chen et al found that the presence of noisy samples can significantly impact the predictive performance of the LDA model ( Chen et al, 2021b ) Some papers ( Yao et al, 2020 ; Wei et al, 2021 ; Kang et al, 2022 ; Lu and Xie, 2023 ) have used random sampling to create balanced datasets by including an equal number of unknown and positive samples in an attempt to mitigate the impact of unbalanced datasets.…”
Section: Introductionmentioning
confidence: 99%
“…The noisy training data set increases the training time and complexity of the model. Consequently, identifying noisy instances and then eliminating or correcting them are useful techniques in data mining research ( Nematzadeh et al, 2020 ). Chen et al found that the presence of noisy samples can significantly impact the predictive performance of the LDA model ( Chen et al, 2021b ) Some papers ( Yao et al, 2020 ; Wei et al, 2021 ; Kang et al, 2022 ; Lu and Xie, 2023 ) have used random sampling to create balanced datasets by including an equal number of unknown and positive samples in an attempt to mitigate the impact of unbalanced datasets.…”
Section: Introductionmentioning
confidence: 99%