Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing 2019
DOI: 10.1145/3313276.3316318
|View full text |Cite
|
Sign up to set email alerts
|

Oblivious dimension reduction for k -means: beyond subspaces and the Johnson-Lindenstrauss lemma

Abstract: We show that for n points in d-dimensional Euclidean space, a data oblivious random projection of the columns onto m ∈ O log k+log log n ε 6 log 1 ε dimensions is sufficient to approximate the cost of all k-means clusterings up to a multiplicative (1 ± ε) factor. The previous-best upper bounds on m are O( log n ε 2 ) given by a direct application of the Johnson-Lindenstrauss Lemma, and O( k ε 2 ) given by . We also prove the existence of a non-oblivious cost preserving sketch with target dimen- . Furthermore, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(30 citation statements)
references
References 49 publications
0
30
0
Order By: Relevance
“…Since rank(x) ≤ k for any x ∈ X (by definition), we also have rank(Y ) ≤ k in Eq. (6). We can therefore relax the constraints xx P = Y to rank(Y ) ≤ k. By [26], the solution to this relaxation is given by the truncated singular value decomposition of the matrix P .…”
Section: Application Of Rps To the Msscmentioning
confidence: 99%
See 1 more Smart Citation
“…Since rank(x) ≤ k for any x ∈ X (by definition), we also have rank(Y ) ≤ k in Eq. (6). We can therefore relax the constraints xx P = Y to rank(Y ) ≤ k. By [26], the solution to this relaxation is given by the truncated singular value decomposition of the matrix P .…”
Section: Application Of Rps To the Msscmentioning
confidence: 99%
“…This result was improved in [18] to a (1 ± ε) approximation error; the same paper also presents a 9 + ε approximation error with O(ε −2 log k) dimensions. It was shown in [6] that there is a RP mapping to O( log k+log log n ε 6 log( 1 ε )) dimensions that preserves the cost of any k-clustering up to a (1 ± ε) approximation error. It was also shown that it is possible to further reduce the projected dimension to O( log k+log(1−δ)…”
Section: Application Of Rps To the Msscmentioning
confidence: 99%
“…Together even with a general coreset construction of size poly(k, log n, 1/ε), one already gets an EPAS with parameter k. Better coresets construction are also given in Euclidean spaces. Recent developments [176][177][178] construct core-sets of size poly(k, 1/ε) (no dependence on n or d), which is further extended to the shortest-path metric of an excluded-minor graph [179].…”
Section: Euclidean K-median and Euclidean K-means With Parameter Kmentioning
confidence: 99%
“…Data summarization which are similar to coresets of size O(k/ε) that are based on projections on low-dimensional subspaces that diminishes the sparsity of the input data were suggested by [14] by improving the analysis of [4]. Recently [15] improves both on [4,14] by applying Johnson-Lindenstrauss Lemma [16] on construction from [4]. However, due to the projections, the resulting summarizations of all works mentioned above are not subset of the input points, unlike the coreset definition of this paper.…”
Section: Importance Samplingmentioning
confidence: 99%