2016
DOI: 10.31154/cogito.v1i1.2.13-23
|View full text |Cite
|
Sign up to set email alerts
|

Comparative Study of Classification Algorithms: Holdouts as Accuracy Estimation

Abstract: Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-mea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 5 publications
0
1
0
1
Order By: Relevance
“…In the random forest analysis, the number of available variables for splitting at each tree node was calculated as the square root of the number of predictor variables (rounded down). The repeated 3-fold stratified cross-validation approach was used to validate the models [ 50 , 51 ]. Due to the unbalanced nature of the dataset, the SMOTE technique was used to attenuate the bias towards the classification in the majority class in each training fold [ 52 ].…”
Section: Methodsmentioning
confidence: 99%
“…In the random forest analysis, the number of available variables for splitting at each tree node was calculated as the square root of the number of predictor variables (rounded down). The repeated 3-fold stratified cross-validation approach was used to validate the models [ 50 , 51 ]. Due to the unbalanced nature of the dataset, the SMOTE technique was used to attenuate the bias towards the classification in the majority class in each training fold [ 52 ].…”
Section: Methodsmentioning
confidence: 99%
“…Ini sederhana dan mudah dipahami, dan pengguna tidak perlu melakukan banyak hal untuk menyiapkan data. Aturan dapat dibuat dengan cepat, dan kompleksitas masalah juga berkurang [13], [14]. DT ini merupakan algoritma non-parametrik sehingga dapat digunakan pada dataset yang besar dan kompleks tanpa memaksakan struktur parametrik yang sulit.…”
Section: A Latar Belakangunclassified