2018 IEEE 34th International Conference on Data Engineering (ICDE) 2018
DOI: 10.1109/icde.2018.00115
|View full text |Cite
|
Sign up to set email alerts
|

Fast k-Means Based on k-NN Graph

Abstract: In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(2 citation statements)
references
References 36 publications
0
2
0
Order By: Relevance
“…For example, nonlinear SVM can be capable of dealing with high-dimensional data but may not be robust to the presence of diverse chemical descriptors. 17 Deng and Zhao 18 reported that the computational cost of KNN increases exponentially with the size of the input samples. Recently, deep learning (DL) has attracted much attention for predicting the outcome of biological assays and becomes a key candidate for toxicity prediction due to its ability to bypass feature extraction.…”
Section: ■ Introductionmentioning
confidence: 99%
“…For example, nonlinear SVM can be capable of dealing with high-dimensional data but may not be robust to the presence of diverse chemical descriptors. 17 Deng and Zhao 18 reported that the computational cost of KNN increases exponentially with the size of the input samples. Recently, deep learning (DL) has attracted much attention for predicting the outcome of biological assays and becomes a key candidate for toxicity prediction due to its ability to bypass feature extraction.…”
Section: ■ Introductionmentioning
confidence: 99%
“…These models perform relatively better on smaller data sets with fewer preselected features. One key limitation of KNN algorithm is the exponential rise of computational cost with the size of the input samples. In contrast, nonlinear SVMs can manage high dimensional data but do not exhibit sufficiently robust performance on diverse chemical descriptors …”
Section: Introductionmentioning
confidence: 99%