2020
DOI: 10.3390/a13040092
|View full text |Cite
|
Sign up to set email alerts
|

Deterministic Coresets for k-Means of Big Sparse Data

Abstract: Let P be a set of n points in R d , k ≥ 1 be an integer and ε ∈ ( 0 , 1 ) be a constant. An ε-coreset is a subset C ⊆ P with appropriate non-negative weights (scalars), that approximates any given set Q ⊆ R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clusterin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…[31] used a two-stage strategy to clustering, using hierarchical and non-hierarchical clustering, and came to the same conclusion about getting better outcomes. [32] suggested using self-organizing charts (i.e. model-oriented) to evaluate the clusters using the k-means algorithm.…”
Section: Two-stage Clustering and Data Sizementioning
confidence: 99%
“…[31] used a two-stage strategy to clustering, using hierarchical and non-hierarchical clustering, and came to the same conclusion about getting better outcomes. [32] suggested using self-organizing charts (i.e. model-oriented) to evaluate the clusters using the k-means algorithm.…”
Section: Two-stage Clustering and Data Sizementioning
confidence: 99%
“…However, these coresets are exponential in the dimension d of the input. Recently a deterministic coreset of size independent of d was suggested (Barger & Feldman, 2020).…”
Section: Accurate Coresetsmentioning
confidence: 99%
“…In particular, we can always improve a given k-clustering, by replacing the center of each cluster by its mean (if this is not already the case). This is indeed the idea behind the classic Lloyd's heuristic [Llo82] and also behind some coresets for k-means [BF20]. Most coreset construction algorithms for those hard problems usually borrow or generalize tricks and techniques used in coreset constructions for the (simpler) mean problem.…”
Section: Introductionmentioning
confidence: 99%
“…A coreset that introduces multiplicative 1 + ε error for this problem (SVD/linear regression) was suggested in [FVR16], also here the authors suggested a reduction to the problem of computing a mean coreset with multiplicative 1 + ε error for a set of point in a higher dimensional space. Another example is in the context of k-means, where [BF20] showed that in order to compute a k-means coreset for a set of points P it is suffices to cluster these points to a large number of clusters, and compute a mean coreset for each cluster, then take the union of these coresets to a single unite set, which is proven to be a k-means coreset for P .…”
Section: Introductionmentioning
confidence: 99%