Study of distance metrics on k - nearest neighbor algorithm for star categorization

Nayak, Swathi; Bhat, M. Poornananda; Reddy, Neelima G.; Rao, B. Ashwath

doi:10.1088/1742-6596/2161/1/012004

Cited by 30 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We took 100-nearest neighbors for each node to strike a balance between representativeness and computational feasibility. We utilized the Minkowski distance measure with p = 2 to enable a standard Euclidean distance metric, thereby compatible with k-means clustering 29 . Subsequently, we applied the Louvain algorithm, which is one of the fastest and most popular methods for community detection 30 .…”

Section: Methodsmentioning

confidence: 99%

Transdiagnostic clustering and network analysis for questionnaire-based symptom profiling and drug recommendation in the UK Biobank and a Korean cohort

Lee,

Baek

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Clinical decision support systems (CDSSs) play a critical role in enhancing the efficiency of mental health care delivery and promoting patient engagement. Transdiagnostic approaches that utilize raw psychological and biological data enable personalized patient profiling and treatment. This study introduces a CDSS incorporating symptom profiling and drug recommendation for mental health care. Among the UK Biobank cohort, we analyzed 157,348 participants for symptom profiling and 14,358 participants with a drug prescription history for drug recommendation. Among the 1307 patients in the Samsung Medical Center cohort, 842 were eligible for analysis. Symptom profiling utilized demographic and questionnaire data, employing conventional clustering and community detection methods. Identified clusters were explored using diagnostic mapping, feature importance, and scoring. For drug recommendation, we employed cluster- and network-based approaches. The analysis identified nine clusters using k-means clustering and ten clusters with the Louvain method. Clusters were annotated for distinct features related to depression, anxiety, psychosis, drug addiction, and self-harm. For drug recommendation, drug prescription probabilities were retrieved for each cluster. A recommended list of drugs, including antidepressants, antipsychotics, mood stabilizers, and sedative–hypnotics, was provided to individual patients. This CDSS holds promise for efficient personalized mental health care and requires further validation and refinement with larger datasets, serving as a valuable tool for mental healthcare providers.

show abstract

Section: Methodsmentioning

confidence: 99%

Transdiagnostic clustering and network analysis for questionnaire-based symptom profiling and drug recommendation in the UK Biobank and a Korean cohort

Lee,

Baek

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…In both these methods, the success metrics (e.g., accuracy) are averaged across all the folds, or in the case of the test-train partition, averaged across a few trials of random partitions. Some common train-test split ratios are 66:33 [1], 70:30 [2,3], and 80:20 [23], and some common folds are 5-fold [24] and 10-fold [5,6,25,26]. Regardless of the ratio of the partition, the randomness of how the data points are divided affects the performance of the classifier, which impacts the repeatability of the experiment [27].…”

Section: Equationmentioning

confidence: 99%

“…The question of how to partition the data into a training set and testing set is also an important consideration. In the literature, a common approach is to split the benchmark datasets with a 66:33 [1], 70:30 [2,3] or 80:20 [23] train-test partition. The success of the classifier is partially dependent on which data are in the training set and which data are in the testing set, especially in small datasets or datasets with under-represented classes.…”

Section: Train-test Splitmentioning

confidence: 99%

“…The choice of distance metric has a significant impact on the accuracy of traditional K-NN classifiers. This impact has been explored in literature with [1][2][3][4][5][6] reporting the success of a variety of distance metrics used with a K-NN classifier on diverse sets of high-dimensional datasets. No singular distance metric performs superiorly across all datasets, but rather the best-performing distance metric is specific to each dataset.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Locally-Scaled Kernels and Confidence Voting

Hofer,

v. Mohrenschildt

2024

MAKE

View full text Add to dashboard Cite

Classification, the task of discerning the class of an unlabeled data point using information from a set of labeled data points, is a well-studied area of machine learning with a variety of approaches. Many of these approaches are closely linked to the selection of metrics or the generalizing of similarities defined by kernels. These metrics or similarity measures often require their parameters to be tuned in order to achieve the highest accuracy for each dataset. For example, an extensive search is required to determine the value of K or the choice of distance metric in K-NN classification. This paper explores a method of kernel construction that when used in classification performs consistently over a variety of datasets and does not require the parameters to be tuned. Inspired by dimensionality reduction techniques (DRT), we construct a kernel-based similarity measure that captures the topological structure of the data. This work compares the accuracy of K-NN classifiers, computed with specific operating parameters that obtain the highest accuracy per dataset, to a single trial of the here-proposed kernel classifier with no specialized parameters on standard benchmark sets. The here-proposed kernel used with simple classifiers has comparable accuracy to the ‘best-case’ K-NN classifiers without requiring the tuning of operating parameters.

show abstract

“…) [27]. In fact, the distance calculation " ", is the length to be considered as the homogeneity criterion for acceptance in a group, for a data-set of points " " is given by:…”

Section: Knn Overviewmentioning

confidence: 99%

Smart K Nearest Neighbor Outlier Detection for Electroencephalogram signal

Abid,

khediri,

Thajaoui

et al. 2023

Preprint

View full text Add to dashboard Cite

Electroencephalogram (EEG) data suffer from artifacts such as poor imagination or loss of concen-tration during the recognition process. The negative impact of artifacts on the quality of informationand EEG signal analysis reduces the Quality of Service "QoS" and Quality of Information "QoI" ine-health applications. This negative impact can be avoided by identifying outliers with anomaly detec-tion algorithms. Aberrant values’ identification using Euclidean distance for the K Nearest Neighbor(KNN) process between the recorded EEG values separates the time series data recorded by a personinto neural activity and artifact. The algorithms using KNN often require knowledge of the data char-acteristics to configure one or more parameters (such as the number of neighbors K and distance D). However, our proposed solution does not require the initial setting of KNN process or any additional parameters. We propose a Smart KNN Outlier Detector (SKOD) which is an unsupervised non-parametric algo-rithm. We evaluated SKOD using various combinations of real EEG data of 140 trials with three channels of the benchmark Brain-Computer Interface "BCI competition II". We tested the perfor-mance of our solution on EEG data as provided by the studied patient (the subject) with the inclusionof different numbers of outliers. Our proposed detector achieves more than 60% of sensitivity andspecificity for detecting abnormal values with outlier detection close to 100%.

show abstract

Study of distance metrics on k - nearest neighbor algorithm for star categorization

Cited by 30 publications

References 11 publications

Transdiagnostic clustering and network analysis for questionnaire-based symptom profiling and drug recommendation in the UK Biobank and a Korean cohort

Transdiagnostic clustering and network analysis for questionnaire-based symptom profiling and drug recommendation in the UK Biobank and a Korean cohort

Locally-Scaled Kernels and Confidence Voting

Smart K Nearest Neighbor Outlier Detection for Electroencephalogram signal

Contact Info

Product

Resources

About