Oblivious dimension reduction for
            <i>k</i>
            -means: beyond subspaces and the Johnson-Lindenstrauss lemma

Becchetti, Luca; Bury, Marc; Cohen-Addad, Vincent; Grandoni, Fabrizio; Schwiegelshohn, Chris

doi:10.1145/3313276.3316318

Cited by 47 publications

(30 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since rank(x) ≤ k for any x ∈ X (by definition), we also have rank(Y ) ≤ k in Eq. (6). We can therefore relax the constraints xx P = Y to rank(Y ) ≤ k. By [26], the solution to this relaxation is given by the truncated singular value decomposition of the matrix P .…”

Section: Application Of Rps To the Msscmentioning

confidence: 99%

“…This result was improved in [18] to a (1 ± ε) approximation error; the same paper also presents a 9 + ε approximation error with O(ε −2 log k) dimensions. It was shown in [6] that there is a RP mapping to O( log k+log log n ε 6 log( 1 ε )) dimensions that preserves the cost of any k-clustering up to a (1 ± ε) approximation error. It was also shown that it is possible to further reduce the projected dimension to O( log k+log(1−δ)…”

Section: Application Of Rps To the Msscmentioning

confidence: 99%

See 1 more Smart Citation

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Liberti

Manca

2021

J Glob Optim

View full text Add to dashboard Cite

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

Section: Application Of Rps To the Msscmentioning

confidence: 99%

Section: Application Of Rps To the Msscmentioning

confidence: 99%

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Liberti

Manca

2021

J Glob Optim

View full text Add to dashboard Cite

show abstract

“…Together even with a general coreset construction of size poly(k, log n, 1/ε), one already gets an EPAS with parameter k. Better coresets construction are also given in Euclidean spaces. Recent developments [176][177][178] construct core-sets of size poly(k, 1/ε) (no dependence on n or d), which is further extended to the shortest-path metric of an excluded-minor graph [179].…”

Section: Euclidean K-median and Euclidean K-means With Parameter Kmentioning

confidence: 99%

A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Data summarization which are similar to coresets of size O(k/ε) that are based on projections on low-dimensional subspaces that diminishes the sparsity of the input data were suggested by [14] by improving the analysis of [4]. Recently [15] improves both on [4,14] by applying Johnson-Lindenstrauss Lemma [16] on construction from [4]. However, due to the projections, the resulting summarizations of all works mentioned above are not subset of the input points, unlike the coreset definition of this paper.…”

Section: Importance Samplingmentioning

confidence: 99%

Deterministic Coresets for k-Means of Big Sparse Data

Barger

Feldman

2020

Algorithms

View full text Add to dashboard Cite

Let P be a set of n points in R d , k ≥ 1 be an integer and ε ∈ ( 0 , 1 ) be a constant. An ε-coreset is a subset C ⊆ P with appropriate non-negative weights (scalars), that approximates any given set Q ⊆ R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clustering or its variants (e.g., discrete k-means, where the centers are restricted to be in P, or other restricted zones) on the small coreset to get faster provable approximations. Moreover, it is known that such coreset support streaming, dynamic and distributed data using the classic merge-reduce trees. The fact that the coreset is a subset implies that it preserves the sparsity of the data. However, existing such coresets are randomized and their size has at least linear dependency on the dimension d. We suggest the first such coreset of size independent of d. This is also the first deterministic coreset construction whose resulting size is not exponential in d. Extensive experimental results and benchmarks are provided on public datasets, including the first coreset of the English Wikipedia using Amazon’s cloud.

show abstract

Oblivious dimension reduction for k -means: beyond subspaces and the Johnson-Lindenstrauss lemma

Cited by 47 publications

References 49 publications

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms

Deterministic Coresets for k-Means of Big Sparse Data

Contact Info

Product

Resources

About