2018
DOI: 10.48550/arxiv.1802.00459
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data

Wei Hu,
Zhao Song,
Lin F. Yang
et al.

Abstract: We consider the k-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space {1, 2, . . . , ∆} d can be dynamically inserted to or deleted from the dataset. For this problem, we provide a one-pass coreset construction algorithm using space O(k • poly(d, log ∆)), where k is the target number of centers. To our knowledge, this is the first dynamic geometric data stream algorithm for k-means using space polynomial in dimension and nearly optimal (linear) in k.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 35 publications
0
8
1
Order By: Relevance
“…In this paper, we suppose all input and output points are in {1, 2, • • • , ∆} d for some ∆, d ∈ Z ≥1 . This assumption is without loss of generality since if the clustering cost is non-zero, we can always discretize the space by changing the cost by an arbitrary small multiplicative error [BFL + 17,HSYZ18]. Given a point set Q ∈ [∆] d , a strong (η, )-coreset of Q for capacitated k-clustering in r is a subset of points Q ⊆ Q with weights w : Q → R >0 such that for any capacity t ≥ |Q|/k and any set of…”
Section: Our Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this paper, we suppose all input and output points are in {1, 2, • • • , ∆} d for some ∆, d ∈ Z ≥1 . This assumption is without loss of generality since if the clustering cost is non-zero, we can always discretize the space by changing the cost by an arbitrary small multiplicative error [BFL + 17,HSYZ18]. Given a point set Q ∈ [∆] d , a strong (η, )-coreset of Q for capacitated k-clustering in r is a subset of points Q ⊆ Q with weights w : Q → R >0 such that for any capacity t ≥ |Q|/k and any set of…”
Section: Our Resultsmentioning
confidence: 99%
“…Let F denote the event that the total number of center cells is at most 2000kL. Lemma 3.2 (Lemma 14 of [HSYZ18]). F happens with probability at least 0.99.…”
Section: Points Partitioningmentioning
confidence: 99%
See 1 more Smart Citation
“…The offline construction is almost the same as the algorithm proposed by [HSYZ18]. We put all analysis into Appendix B for completeness.…”
Section: Offline Coreset Constructionmentioning
confidence: 99%
“…The goal is to prove Theorem 5.2. The analysis is similar to [HSYZ18]. We include the analysis in this section for completeness.…”
Section: A1 Smooth Function and Smooth Histogrammentioning
confidence: 99%