We consider the classic Facility Location, k-Median, and k-Means problems in metric spaces of doubling dimension d. We give nearly linear-time approximation schemes for each problem. The complexity of our algorithms is 2 plogp1{εq{εq Opd 2 q n log 4 n`2 Opdq n log 9 n, making a significant improvement over the state-of-the-art algorithms which run in time n pd{εq Opdq .Moreover, we show how to extend the techniques used to get the first efficient approximation schemes for the problems of prize-collecting k-Medians and k-Means, and efficient bicriteria approximation schemes for k-Medians with outliers, k-Means with outliers and k-Center.help to handle some noise from the input: the k-Median objective can be dramatically perturbed by the addition of a few distant clients, which must then be discarded.
Our resultsWe solve this open problem by proposing the first near-linear time algorithms for the k-Median and k-Means problems in metrics of fixed doubling dimension. More precisely, we show the following theorems, where we let f pεq " p1{εq 1{ε .Theorem 1.1. For any 0 ă ε ă 1{3, there exists a randomized p1`εq-approximation algorithm for k-Median in metrics of doubling dimension d with running time f pεq 2 Opd 2 q n log 4 n`2 Opdq n log 9 n and success probability at least 1´ε.Theorem 1.2. For any 0 ă ε ă 1{3, there exists a randomized p1`εq-approximation algorithm for k-Means in metrics of doubling dimension d with running time f pεq 2 Opd 2 q n log 5 n`2 Opdq n log 9 n and success probability at least 1´ε.Our results also extend to the Facility Location problem, in which no bound on the number of opened centers is given, but each center comes with an opening cost. The aim is to minimize the sum of the (1st power) of the distances from each point of the metric to its closest center, in addition to the total opening costs of all used centers.Theorem 1.3. For any 0 ă ε ă 1{3, there exists a randomized p1`εq-approximation algorithm for Facility Location in metrics of doubling dimension d with running time f pεq 2 Opd 2 q¨n`2 Opdq n log n and success probability at least 1´ε.In all these theorems, we make the common assumption to have access to the distances of the metric in constant time, as, e.g., in [18,27,29]. This assumption is discussed in Bartal et al. [9].Note that the double-exponential dependence on d is unavoidable unless P = NP, since the problems are APX-hard in Euclidean space of dimension d " Oplog nq. For Euclidean inputs, our algorithms for the k-Means and k-Median problems outperform the ones of Cohen-Addad [15], removing in particular the dependence on k, and the one of Kolliopoulos and Rao [32] when d ą 3, by removing the dependence on log d`6 n. Interestingly, for k " ωplog 9 nq our algorithm for the k-Means problem is faster than popular heuristics like k-Means++ which runs in time Opnkq in Euclidean space. We note that the success probability can be boosted to 1´ε δ by repeating the algorithm log δ times and outputting the best solution encountered.After proving the three theorems above, we will a...