Bodo Manthey scite author profile

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points.In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set. * Supported by a fellowship within the Postdoc-Program of the German Academic Exchange Service (DAAD).

show abstract

Smoothed Analysis of the k-Means Method

Arthur

Manthey

Röglin

2011

J. ACM

View full text Add to dashboard Cite

The k -means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k -means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this article, we settle the smoothed running time of the k -means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/ σ , where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k -means method will run in expected polynomial time on that input set.

show abstract

Random Shortest Paths: Non-Euclidean Instances for Metric Optimization Problems

et al. 2014

View full text Add to dashboard Cite

Probabilistic analysis for metric optimization problems has mostly been conducted on random Euclidean instances, but little is known about metric instances drawn from distributions other than the Euclidean. This motivates our study of random metric instances for optimization problems obtained as follows: Every edge of a complete graph gets a weight drawn independently at random. The distance between two nodes is then the length of a shortest path (with respect to the weights drawn) that connects these nodes.We prove structural properties of the random shortest path metrics generated in this way. Our main structural contribution is the construction of a good clustering. Then we apply these findings to analyze the approximation ratios of heuristics for matching, the traveling salesman problem (TSP), and the k-median problem, as well as the running-time of the 2-opt heuristic for the TSP. The bounds that we obtain are considerably better than the respective worst-case bounds. This suggests that random shortest path metrics are easy instances, similar to random Euclidean instances, albeit for completely different structural reasons.

show abstract

Improved Smoothed Analysis of the k-Means Method

Manthey

Röglin

2009

View full text Add to dashboard Cite

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii [3] aimed at closing this gap, and they proved a bound of poly(n k , σ −1 ) on the smoothed runningtime of the k-means method, where n is the number of data points and σ is the standard deviation of the Gaussian perturbation. This bound, though better than the worstcase bound, is still much larger than the running-time observed in practice.We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in n √ k and σ −1 . Second, we prove an upper bound of k kd ·poly(n, σ −1 ), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k, d ∈ O( p log n/ log log n). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.

show abstract

Smoothed Analysis of the Successive Shortest Path Algorithm

Brunsch¹,

Cornelissen

Manthey

et al. 2013

View full text Add to dashboard Cite

The minimum-cost flow problem is a classic problem in combinatorial optimization with various applications. Several pseudo-polynomial, polynomial, and strongly polynomial algorithms have been developed in the past decades, and it seems that both the problem and the algorithms are well understood. However, some of the algorithms' running times observed in empirical studies contrast the running times obtained by worst-case analysis not only in the order of magnitude but also in the ranking when compared to each other. For example, the Successive Shortest Path (SSP) algorithm, which has an exponential worst-case running time, seems to outperform the strongly polynomial Minimum-Mean Cycle Canceling algorithm. To explain this discrepancy, we study the SSP algorithm in the framework of smoothed analysis and establish a bound of O(mnφ(m + n log n)) for its smoothed running time. This shows that worst-case instances for the SSP algorithm are not robust and unlikely to be encountered in practice.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bodo Manthey

k-Means Has Polynomial Smoothed Complexity

Smoothed Analysis of the k-Means Method

Random Shortest Paths: Non-Euclidean Instances for Metric Optimization Problems

Improved Smoothed Analysis of the k-Means Method

Smoothed Analysis of the Successive Shortest Path Algorithm

Contact Info

Product

Resources

About