Proceedings of the 7th ACM International Conference on Web Search and Data Mining 2014
DOI: 10.1145/2556195.2556260
|View full text |Cite
|
Sign up to set email alerts
|

Scalable K-Means by ranked retrieval

Abstract: The k-means clustering algorithm has a long history and a proven practical performance, however it does not scale to clustering millions of data points into thousands of clusters in high dimensional spaces. The main computational bottleneck is the need to recompute the nearest centroid for every data point at every iteration, a prohibitive cost when the number of clusters is large. In this paper we show how to reduce the cost of the k-means algorithm by large factors by adapting ranked retrieval techniques. Us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(47 citation statements)
references
References 28 publications
0
47
0
Order By: Relevance
“…Comparison with IQ-means: IQ-means is an accelerated version of ranked-retrieval [8] that skips distance computations when vectors are placed far away from centers. IQ-means can be the fastest clustering method for large-scale data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Comparison with IQ-means: IQ-means is an accelerated version of ranked-retrieval [8] that skips distance computations when vectors are placed far away from centers. IQ-means can be the fastest clustering method for large-scale data.…”
Section: Discussionmentioning
confidence: 99%
“…Such approximated k-means methods include approximated search [31], hierarchical search [27], approximated bounds [38], and batch-based methods [26,34]. If the size of the input data is large, subset-based methods [2,8] can achieve the fastest performance. ese methods only treat a subset of the input vectors (i.e., vectors close to each center), making the computation e cient.…”
Section: Related Workmentioning
confidence: 99%
“…The related research includes SNOB [62], MCLUST [63], k-medoids, and k-means related research [64,65]. Density-based partitioning methods attempt to discover low-dimensional data, which is denseconnected, known as spatial data.…”
Section: Clustering Clustering Algorithmsmentioning
confidence: 99%
“…In recent years, continuous efforts have been devoted to looking for effective solutions that are still workable in webscale data. Representative works are [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. However, most of the k-means variants achieve high speed efficiency while sacrificing the clustering quality.…”
Section: Introductionmentioning
confidence: 99%