2019
DOI: 10.14710/jtsiskom.8.1.2020.54-58
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of distance measurement on k-nearest neighbour in textual data classification

Abstract: One algorithm to classify textual data in automatic organizing of documents application is KNN, by changing word representations into vectors. The distance calculation in the KNN algorithm becomes essential in measuring the closeness between data elements. This study compares four distance calculations commonly used in KNN, namely Euclidean, Chebyshev, Manhattan, and Minkowski. The dataset used data from Youtube Eminem’s comments which contain 448 data. This study showed that Euclidian or Minkowski on the KNN … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
7

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 11 publications
0
5
0
7
Order By: Relevance
“…The effectiveness of the K-NN algorithm's classification results is also influenced by selecting the proper distance metric, such as Euclidean and Manhattan since it will change how the clusters forms [13]. Studies in this discussion about textual data classification show that the Euclidean distance metric yields the best performance (accuracy value of 85.5%) compared to the Manhattan distance (accuracy value of (85.48%) [14]. Research about stroke disease detection shows that the Manhattan distance performs better in the classification than the Euclidean distance, with an accuracy value of 96.03% against 95.93% [15].…”
Section: Imentioning
confidence: 95%
“…The effectiveness of the K-NN algorithm's classification results is also influenced by selecting the proper distance metric, such as Euclidean and Manhattan since it will change how the clusters forms [13]. Studies in this discussion about textual data classification show that the Euclidean distance metric yields the best performance (accuracy value of 85.5%) compared to the Manhattan distance (accuracy value of (85.48%) [14]. Research about stroke disease detection shows that the Manhattan distance performs better in the classification than the Euclidean distance, with an accuracy value of 96.03% against 95.93% [15].…”
Section: Imentioning
confidence: 95%
“…Sebelum melakukan pengelompokan data untuk proses deteksi, ditetapkan terlebih dahulu ukuran jarak antar elemen data. Dalam berbagai aplikasi, beragam metode pengukuran jarak digunakan untuk menilai tingkat kemiripan data, seperti jarak Euclidean, Manhattan (City Block Distance), Mahalanobis, Korelasi, Berbasis Sudut, Minkowski, dan Squared Euclidean [3].…”
Section: Pendahuluanunclassified
“…Pada fase klasifikasi, fitur-fitur yang sama dihitung untuk data pengujian (yang klasifikasinya tidak diketahui). Setelah jarak dari vektor yang baru terhadap seluruh vektor data pembelajaran dihitung dan sejumlah K buah yang paling dekat diambil, selanjutnya klasifikasi ditentukan dari titik-titik tersebut (Wahyono et al, 2020). Algoritma K-Nearest Neighbor (KNN) adalah sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya 14 paling dekat dengan objek tersebut.…”
Section: Pendahuluanunclassified