2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX) 2015
DOI: 10.1137/1.9781611974317.7
|View full text |Cite
|
Sign up to set email alerts
|

An Algorithm for Online K-Means Clustering

Abstract: This paper shows that one can be competitive with the k-means objective while operating online. In this model, the algorithm receives vectors v 1 , . . . , v n one by one in an arbitrary order. For each vector v t the algorithm outputs a cluster identifier before receiving v t+1 . Our online algorithm generates O(k log n log γn) clusters whose expected k-means cost is O(W * log n). Here, W * is the optimal k-means cost using k clusters and γ is the aspect ratio of the data. The dependence on γ is shown to be u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
48
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 65 publications
(53 citation statements)
references
References 21 publications
1
48
0
1
Order By: Relevance
“…Grinch makes heavy use of nearest neighbor search under the linkage function f . Rather than perform nearest neighbor search anew for each graft, when a data point arrives, we perform a single k-NN search (k ∈[25, 50]) and only consider these nodes during subsequent grafts (until the next data point arrives). Ablation.…”
mentioning
confidence: 99%
“…Grinch makes heavy use of nearest neighbor search under the linkage function f . Rather than perform nearest neighbor search anew for each graft, when a data point arrives, we perform a single k-NN search (k ∈[25, 50]) and only consider these nodes during subsequent grafts (until the next data point arrives). Ablation.…”
mentioning
confidence: 99%
“…Source code is at: https://github.com/tyler-hayes/ExStream. 2) Online k-means: This is a partitioning-based heuristic for an online variant of the traditional k-means clustering algorithm [49]. This heuristic is sometimes referred to as Learning Vector Quantization [43].…”
Section: Memory Efficient Rehearsalmentioning
confidence: 99%
“…A natural approach is to use stochastic gradient methods to optimize the K-means cost [9,34]. Liberty et al [25] design an alternative online K-means algorithm that when processing a point, opts to start a new cluster if the point is far from the current centers. This idea draws inspiration from the algorithm of Charikar et al [11] for the online k-center problem, which also adjusts the current centers when a new point is far away.…”
Section: Related Workmentioning
confidence: 99%