2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) 2019
DOI: 10.1109/comitcon.2019.8862250
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Pre-processing Technique for Handling Imbalanced Data and Detecting Outliers for KNN Classifier

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0
4

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(18 citation statements)
references
References 6 publications
0
14
0
4
Order By: Relevance
“…Moreover, we selected the optimal sample size for MC simulation by measuring the percentage of variance of the PCA components trained on the MC pre-processed dataset explained by the PCA components trained on the IQR pre-processed dataset in terms of R 2 due to many considerations in the literature which report pre-processing techniques for similar anomaly detection scenarios based on the IQR method [24]. This analysis led to an optimal value of three samples to be considered for the median computation.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, we selected the optimal sample size for MC simulation by measuring the percentage of variance of the PCA components trained on the MC pre-processed dataset explained by the PCA components trained on the IQR pre-processed dataset in terms of R 2 due to many considerations in the literature which report pre-processing techniques for similar anomaly detection scenarios based on the IQR method [24]. This analysis led to an optimal value of three samples to be considered for the median computation.…”
Section: Discussionmentioning
confidence: 99%
“…The related literature also reports pre-processing techniques for similar anomaly detection scenarios based on the IQR method (e.g., [24]), which, however, offers only the property of outlier removal and not the additional benefit of outlier replacement that is consequential to applying MC simulation, as further discussed in Section 3.…”
Section: Related Workmentioning
confidence: 99%
“…In paper [3] the authors have explored ways to increase the efficiency of KNN algorithm so that it can give out better results. The evolutionary Genetic Algorithm is used to select the finest parameters of the nonlinear functions that are suitable for each feature, and the results are better comparatively and on similar lines in paper [4], Preeti Nair and Indu Kashyap have made implored that by introducing resample technique and Inter quartile range technique (IQR) in the pre-processing steps the data fed to classifiers are normalized which gives out better working of the algorithm.…”
Section: Related Workmentioning
confidence: 92%
“…Dataset yang memiliki missing values, noisy data dan imbalanced dataset dapat mempengaruhi hasil klasifikasi dan akurasi [3]. Missing values ditemukan jika dalam suatu dataset ada data yang hilang pada suatu atributnya [4]. Noisy data ditemukan jika dalam suatu dataset ada data yang di luar range atau tidak berhubungan [5].…”
Section: Pendahuluanunclassified
“…Normalisasi atribut adalah proses konversi data untuk suatu atribut dari numerik menjadi skala secara umum dan sebaliknya, tanpa menganggu variasi datanya [8]. Imbalanced dataset terjadi jika dalam suatu dataset perbandingan dari jumlah label yang tidak seimbang atau terlalu jauh [4]. Dalam metode klasifikasi, label dijadikan acuan untuk mengukur performa dalam model yang dihasilkan [9], sehingga jika dalam suatu dataset salah satu label atau class-nya mempunyai jumlah yang lebih besar secara signifikan dibanding label lainnya, maka kemungkinan pengujian ata hasil klasifikasi dengan data uji akan mempunyai kecederungan lebih dekat pada label yang mempunyai jumlah yang lebih besar pada proses trainingnya.…”
Section: Pendahuluanunclassified