We present new scalar and matrix Cherno↵-style concentration bounds for a broad class of probability distributions over the binary hypercube {0, 1} n . Motivated by recent tools developed for the study of mixing times of Markov chains on discrete distributions, we say that a distribution is `1-independent when the infinity norm of its influence matrix I is bounded by a constant. We show that any distribution which is `1-infinity independent satisfies a matrix Cherno↵ bound that matches the matrix Cherno↵ bound for independent random variables due to Tropp. Our matrix Cherno↵ bound is a broad generalization and strengthening of the matrix Cherno↵ bound of Kyng and Song (FOCS'18). Using our bound, we can conclude as a corollary that a union of O(log |V |) random spanning trees gives a spectral graph sparsifier of a graph with |V | vertices with high probability, matching results for independent edge sampling, and matching lower bounds from Kyng and Song.
Given a dataset V of points from some metric space, a popular robust formulation of the k-center clustering problem requires to select k points (centers) of V which minimize the maximum distance of any point of V from its closest center, excluding the z most distant points (outliers) from the computation of the maximum. In this paper, we focus on an important constrained variant of the robust k-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank k built on V. Instantiating the problem with the partition matroid yields a formulation of the fair k-center problem, which has attracted the interest of the ML community in recent years. In this paper, we target accurate solutions of the RMC problem under general matroids, when confronted with large inputs. Specifically, we devise a coreset-based algorithm affording efficient sequential, distributed (MapReduce) and streaming implementations. For any fixed $$\varepsilon >0$$ ε > 0 , the algorithm returns solutions featuring a $$(3+\varepsilon )$$ ( 3 + ε ) -approximation ratio, which is a mere additive term $$\varepsilon$$ ε away from the 3-approximations achievable by the best known polynomial-time sequential algorithms. Moreover, the algorithm obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D. For wide ranges of $$k,z,\varepsilon , D$$ k , z , ε , D , our MapReduce/streaming implementations require two rounds/one pass and substantially sublinear local/working memory. The theoretical results are complemented by an extensive set of experiments on real-world datasets, which provide clear evidence of the accuracy and efficiency of our algorithms and of their improved performance with respect to previous solutions.
We show new scalar and matrix Chernoff-style concentration bounds for a broad class of probability distributions over {0, 1} n . Building on developments in high-dimensional expanders (Kaufman and Mass ITCS'17, Dinur and Kaufman FOCS'17, Kaufman and Oppenheim Combinatorica'20) and matroid theory (Adiprasito et al. Ann. Math.'18), a breakthrough result of Anari, Liu, Oveis and Vinzant (STOC '19) showed that the up-down random walk on matroid bases has polynomial mixing time -making it possible to efficiently sample from the (weighted) uniform distribution over matroid bases. Since then, there has been a flurry of related work proving polynomial mixing times for random walks used to sample from a wide range of discrete probability distributions. Many works have observed that as a corollary of their mixing time analysis, one can obtain scalar concentration for 1-Lipschitz functions of samples from the stable distribution via standard arguments that convert bounds on the modified log-Sobolev (MLS) or Poincaré constant of a random walk into concentration results for the associated stable distribution of the random walk, see for example Hermon and Salez (arXiv'19). Several recent works have considered a matrix analog of the Poincaré inequality for a random walk with an associated stable distribution. Using this matrix Poincaré inequality, these works have derived a concentration result for matrix-valued spectral norm-Lipschitz functions of samples from the distribution, see for example Auon et al. (Adv. Math.'20). Unfortunately, these bounds are weak in many important regimes.A recently developed strategy for analyzing up-down walks is based around a novel notion of spectral independence, which quantifies the dependence between variables in a distribution over {0, 1} n using the largest eigenvalue of an associated pairwise influence matrix I, see Anari et al. (FOCS'20). Many works on spectral independence have in fact bounded a stronger quantity I ∞→∞ ≥ λ max (I), which we call ℓ ∞ -independence. We show that any distribution over {0, 1} n which has bounded ℓ ∞ -independence satisfies a matrix Chernoff bound that in key regimes is much stronger than spectral norm-Lipschitz function concentration bounds derived from matrix Poincaré inequalities. Our bounds match the matrix Chernoff bound for independent random variables due to Tropp, which is the strongest known in many cases and is essentially tight for several key settings. For spectral graph sparsification, our matrix concentration results are exponentially stronger than those obtained from matrix Poincaré inequalities. Our matrix Chernoff bound is a broad generalization and strengthening of the matrix Chernoff bound of Kyng and Song (FOCS'18). Using our bound, we can conclude as a corollary that a union of O(log |V |) random spanning trees gives a spectral graph sparsifier of a graph with |V | vertices with high probability, matching results for independent edge sampling, and matching lower bounds from Kyng and Song. This improves on the O(log 2 |V |) spanning tr...
Given a dataset V of points from some metric space, the popular k-center problem requires to identify a subset of k points (centers) in V minimizing the maximum distance of any point of V from its closest center. The robust formulation of the problem features a further parameter z and allows up to z points of V (outliers) to be disregarded when computing the maximum distance from the centers. In this paper, we focus on two important constrained variants of the robust k-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank k built on V , and the Robust Knapsack Center (RKC) problem, where each element i ∈ V is given a positive weight wi < 1 and the aggregate weight of the returned centers must be at most 1. We devise coreset-based strategies for the two problems which yield efficient sequential, MapReduce, and Streaming algorithms. More specifically, for any fixed ǫ > 0, the algorithms return solutions featuring a (3 + ǫ)-approximation ratio, which is a mere additive term ǫ away from the 3-approximations achievable by the best known polynomial-time sequential algorithms for the two problems. Moreover, the algorithms obliviously adapt to the intrinsic complexity of the dataset, captured by its doubling dimension D. For wide ranges of the parameters k, z, ǫ, D, we obtain a sequential algorithm with running time linear in |V |, and MapReduce/Streaming algorithms with few rounds/passes and substantially sublinear local/working memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.