Improved and simplified inapproximability for k-means

Lee, Euiwoong; Schmidt, Melanie; Wright, John

doi:10.1016/j.ipl.2016.11.009

Cited by 72 publications

(67 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their analysis also shows that no natural local search algorithm performing a fixed number of swaps can improve upon this ratio. This leads to a barrier for these techniques that are rather far away from the best-known inapproximability result which only says that it is NP-hard to approximate this problem to within a factor better than 1.0013 [20].…”

Section: Introductionmentioning

confidence: 97%

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Ahmadian

Norouzi-Fard

Svensson

et al. 2017

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

134

275

View full text Add to dashboard Cite

Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9 + ǫ, a ratio that is known to be tight with respect to such methods.We overcome this barrier by presenting a new primal-dual approach that allows us to (1) exploit the geometric structure of k-means and (2) to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee. Our main result is a 6.357-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of k-means where the underlying metric is not required to be Euclidean and for k-median in Euclidean metrics.

show abstract

Section: Introductionmentioning

confidence: 97%

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Ahmadian

Norouzi-Fard

Svensson

et al. 2017

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

134

275

View full text Add to dashboard Cite

show abstract

“…Dasgupta first showed that the problem is NP-Hard in large dimensions [17]. A recent work of Awasthi et al [18] showed the APX-Hardness of the k-means problem in the Euclidean metric and the inapproximability bound was recently improved to 1.0013 by Lee et al [19]. Yet, we do not know of a better approximation algorithm for the continuous version and so the best known approximation algorithm achieves a 6.47-approximation.…”

Section: Introductionmentioning

confidence: 97%

“…However, the hardness of approximation for these problems is very close to 1 and, combined with the loss induced by the embedding, this cannot lead to a hardness greater than 1.01. For example, the recent approach of Awasthi et al [18] and Lee et al [19] is a reduction from vertex cover on triangle-free graphs which introduces a direct embedding for the k-means problem. Unfortunately, the gap of the reduction is also a function of the degree of the input graph, and so requires that the instance of vertex cover has bounded degree.…”

Section: Introductionmentioning

confidence: 99%

Inapproximability of Clustering in Lp Metrics

Cohen-Addad¹,

Karthik

2019

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

Proving hardness of approximation for min-sum objectives is an infamous challenge. For classic problems such as the Traveling Salesman problem, the Steiner tree problem, or the k-means and k-median problems, the best known inapproximability bounds for L-p metrics of dimension O(log n) remain well below 1.01.In this paper, we take a significant step to improve the hardness of approximation of the k-means problem in various L-p metrics, and more particularly on Manhattan (L-1), Euclidean (L-2), Hamming (L-0) and Chebyshev (L-infinity) metrics of dimension log n and above.We show that it is hard to approximate the k-means objective in O(log n) dimensional space:(1) To a factor of 3.94 in the L-infinity metric when centers have to be chosen from a discrete set of locations (i.e., the discrete case). This improves upon the result of Guruswami and Indyk (SODA'03) who proved hardness of approximation for a factor less than 1.01.(2) To a factor of 1.56 in the L-1 metric and to a factor of 1.17 in the L-2 metric, both in the discrete case. This improves upon the result of Trevisan (SICOMP'00) who proved hardness of approximation for a factor less than 1.01 in both the metrics. (3) To a factor of 1.07 in the L-2 metric, when centers can be placed at arbitrary locations, (i.e., the continuous case). This improves on a result of Lee-Schmidt-Wright (IPL'17) who proved hardness of approximation for a factor of 1.0013. We also obtain similar improvements over the state of the art hardness of approximation results for the k-median objective in various L-p metrics.Our hardness result given in (1) above, is under the standard NP is not equal to P assumption, whereas all the remaining results given above are under the Unique Games Conjecture (UGC). We can remove our reliance on UGC and prove standard NP-hardness for the above problems but for smaller approximation factors.Finally, we note that in order to obtain our result for the L-1 and L-infinity metrics in O(log n) dimensional space we introduce an embedding technique which combines the transcripts of certain communication protocols with the geometric realization of certain graphs.

show abstract

“…Traditionally, the theory of clustering (and more generally, the theory of algorithms) has focused on the analysis of worst-case instances [Arya et al, 2004, Byrka et al, 2015, Charikar et al, 1999, 2001, Chen, 2008, Gonzalez, 1985, Makarychev et al, 2016. For example, it is well known the popular objective functions are provably NP-hard to optimize exactly or even approximately (APX-hard) [Gonzalez, 1985, Jain et al, 2002, Lee et al, 2017, so research has focused on finding approximation algorithms. While this perspective has led to many elegant approximation algorithms and lower bounds for worst-case instances, it is often overly pessimistic of an algorithm's performance on "typical" instances or real world instances.…”

Section: Introductionmentioning

confidence: 99%

k -center Clustering under Perturbation Resilience

Balcan

Haghtalab

White

2020

ACM Trans. Algorithms

View full text Add to dashboard Cite

The k-center problem is a canonical and long-studied facility location and clustering problem with many applications in both its symmetric and asymmetric forms. Both versions of the problem have tight approximation factors on worst case instances: a 2-approximation for symmetric k-center and an O(log * (k))-approximation for the asymmetric version. Therefore to improve on these ratios, one must go beyond the worst case.In this work, we take this approach and provide strong positive results both for the asymmetric and symmetric k-center problems under a natural input stability (promise) condition called α-perturbation resilience [Bilu and Linial, 2012], which states that the optimal solution does not change under any α-factor perturbation to the input distances. We provide algorithms that give strong guarantees simultaneously for stable and non-stable instances: our algorithms always inherit the worst-case guarantees of clustering approximation algorithms, and output the optimal solution if the input is 2-perturbation resilient. In particular, we show that if the input is only perturbation resilient on part of the data, our algorithm will return the optimal clusters from the region of the data that is perturbation resilient, while achieving the best worst-case approximation guarantee on the remainder of the data. Furthermore, we prove our result is tight by showing symmetric k-center under (2 − )-perturbation resilience is hard unless N P = RP .The impact of our results are multifaceted. First, to our knowledge, asymmetric k-center is the first problem that is hard to approximate to any constant factor in the worst case, yet can be optimally solved in polynomial time under perturbation resilience for a constant value of α. This is also the first tight result for any problem under perturbation resilience, i.e., this is the first time the exact value of α for which the problem switches from being NP-hard to efficiently computable has been found. Furthermore, our results illustrate a surprising relationship between symmetric and asymmetric k-center instances under perturbation resilience. Unlike approximation ratio, for which symmetric k-center is easily solved to a factor of 2 but asymmetric k-center cannot be approximated to any constant factor, both symmetric and asymmetric k-center can be solved optimally under resilience to 2-perturbations. Finally, our guarantees in the setting where only part of the data satisfies perturbation resilience makes these algorithms more applicable to real-life instances.

show abstract

Improved and simplified inapproximability for k-means

Cited by 72 publications

References 9 publications

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Inapproximability of Clustering in Lp Metrics

k -center Clustering under Perturbation Resilience

Contact Info

Product

Resources

About