2022
DOI: 10.1088/1742-6596/2161/1/012004
|View full text |Cite
|
Sign up to set email alerts
|

Study of distance metrics on k - nearest neighbor algorithm for star categorization

Abstract: Classification of stars is essential to investigate the characteristics and behavior of stars. Performing classifications manually is error-prone and time-consuming. Machine learning provides a computerized solution to handle huge volumes of data with minimal human input. k-Nearest Neighbor (kNN) is one of the simplest supervised learning approaches in machine learning. This paper aims at studying and analyzing the performance of the kNN algorithm on the star dataset. In this paper, we have analyzed the accura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…We took 100-nearest neighbors for each node to strike a balance between representativeness and computational feasibility. We utilized the Minkowski distance measure with p = 2 to enable a standard Euclidean distance metric, thereby compatible with k-means clustering 29 . Subsequently, we applied the Louvain algorithm, which is one of the fastest and most popular methods for community detection 30 .…”
Section: Methodsmentioning
confidence: 99%
“…We took 100-nearest neighbors for each node to strike a balance between representativeness and computational feasibility. We utilized the Minkowski distance measure with p = 2 to enable a standard Euclidean distance metric, thereby compatible with k-means clustering 29 . Subsequently, we applied the Louvain algorithm, which is one of the fastest and most popular methods for community detection 30 .…”
Section: Methodsmentioning
confidence: 99%
“…In both these methods, the success metrics (e.g., accuracy) are averaged across all the folds, or in the case of the test-train partition, averaged across a few trials of random partitions. Some common train-test split ratios are 66:33 [1], 70:30 [2,3], and 80:20 [23], and some common folds are 5-fold [24] and 10-fold [5,6,25,26]. Regardless of the ratio of the partition, the randomness of how the data points are divided affects the performance of the classifier, which impacts the repeatability of the experiment [27].…”
Section: Equationmentioning
confidence: 99%
“…The question of how to partition the data into a training set and testing set is also an important consideration. In the literature, a common approach is to split the benchmark datasets with a 66:33 [1], 70:30 [2,3] or 80:20 [23] train-test partition. The success of the classifier is partially dependent on which data are in the training set and which data are in the testing set, especially in small datasets or datasets with under-represented classes.…”
Section: Train-test Splitmentioning
confidence: 99%
See 1 more Smart Citation
“…) [27]. In fact, the distance calculation " ", is the length to be considered as the homogeneity criterion for acceptance in a group, for a data-set of points " " is given by:…”
Section: Knn Overviewmentioning
confidence: 99%