Nathan S. Netanyahu scite author profile

, or permissions@acm.org. · S. Arya, et al.Consider a set S of n data points in real d-dimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q ∈ R d , the closest point of S to q can be reported quickly. Given any positive real , a data point p is a (1 + )-approximate nearest neighbor of q if its distance from q is within a factor of (1 + ) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q ∈ R d , and > 0, a (1 + )-approximate nearest neighbor of q can be computed ind is a factor depending only on dimension and . In general, we show that given an integer k ≥ 1, (1 + )-approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.

show abstract

A local search approximation algorithm for k-means clustering

Kanungo

Mount

Netanyahu

et al. 2004

Computational Geometry

374

314

View full text Add to dashboard Cite

In k-means clustering we are given a set of n data points in d-dimensional space d and an integer k, and the problem is to determine a set of k points in d , called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9 − ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nathan S. Netanyahu

An efficient k-means clustering algorithm: analysis and implementation

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

A local search approximation algorithm for k-means clustering

Contact Info

Product

Resources

About