We present O(log log n)-round algorithms in the Massively Parallel Computation (MPC) model, withÕ(n) memory per machine, that compute a maximal independent set, a 1 + ε approximation of maximum matching, and a 2 + ε approximation of minimum vertex cover, for any n-vertex graph and any constant ε > 0. These improve the state of the art as follows:• Our MIS algorithm leads to a simple O(log log ∆)-round MIS algorithm in the CONGESTED-CLIQUE model of distributed computing, which improves on theÕ( log ∆)-round algorithm of Ghaffari [PODC'17].• Our O(log log n)-round (1 + ε)-approximate maximum matching algorithm simplifies or improves on the following prior work: O(log 2 log n)-round (1 + ε)-approximation algorithm of Czumaj et al. [STOC'18] and O(log log n)-round (1 + ε)-approximation algorithm of Assadi et al. [SODA'19]. • Our O(log log n)-round (2 + ε)-approximate minimum vertex cover algorithm improves on an O(log log n)-round O(1)-approximation of Assadi et al. [arXiv'17]. The ModelsWe consider two closely related models: Massively Parallel Computation (MPC), and the CONGESTED-CLIQUE model of distributed computing. Indeed, we consider it as a conceptual contribution of this paper to (further) exhibit the proximity of these two models. We next review these models. The MPC modelThe MPC model was first introduced in [KSV10] and later refined in [GSZ11, BKS13, ANOY14].The computation in this model proceeds in synchronous rounds carried out by m machines. At the beginning of every round, the data (e.g. vertices and edges) is distributed across the machines. During a round, each machine performs computation locally without communicating to other machines. At the end of the round, the machines exchange messages which are used to guide the computation in the next round. In every round, each machine receives and outputs messages that fit into its local memory.Space: In this model, each machine has S words of space. If N is the total size of the data and each machine has S words of space, the typical settings that are of interest are when S is sublinear in N and S · m = Θ(N ). That is, the total memory across all the machines suffices to fit all the data, but is not much larger than that. If we are given a graph on n vertices, in our work we consider the regimes in which S ∈ Θ(n/ polylog n) or S ∈ Θ(n). Communication vs. computational complexity:Our main focus is the number of rounds required to finish the computation, which is essentially the complexity of the communication needed to solve the problem. Although we do not explicitly state the computational complexity in our results, it will be apparent from the description of our algorithms that the total computation time across all the machines is nearly-linear in the input size. CONGESTED-CLIQUEA second model that we consider is the CONGESTED-CLIQUE model of distributed computing, which was introduced by Lotker, Pavlov, Patt-Shamir, and Peleg [LPPSP03] and has been stud-ied extensively since then, see e.g.]. In this model, we have n players which can communicate in sync...
We study the problem of estimating the value of sums of the form Sp
We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N (µ, Σ) means a samples is only revealed if it falls in some subset S ⊆ R d ; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean µ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible.
We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D ∈ P and ℓ 1 (D, P) > ε. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.in Theoretical Computer Science, originating from the papers of Batu et al. [BFR + 00, BFF + 01, GR00] has also been tackling similar questions in the setting of property testing (see [Ron08,Ron10,Rub12,Can15] for surveys on this field). This very active area has seen a spate of results and breakthroughs over the past decade, culminating in very efficient (both sample and time-wise) algorithms for a wide range of distribution testing problems [BDKR05, GMV06, AAK + 07, DDS + 13, CDVV14, AD15, DKN15b]. In many cases, this led to a tight characterization of the number of samples required for these tasks as well as the development of new tools and techniques, drawing connections to learning and information theory [VV10, VV11a, VV14].In this paper, we focus on the following general property testing problem: given a class (property) of distributions P and sample access to an arbitrary distribution D, one must distinguish between the case that (a) D ∈ P, versus (b) D − D ′ 1 > ε for all D ′ ∈ P (i.e., D is either in the class, or far from it). While many of the previous works have focused on the testing of specific properties of distributions or obtained algorithms and lower bounds on a case-by-case basis, an emerging trend in distribution testing is to design general frameworks that can be applied to several property testing problems [Val11,VV11a,DKN15b,DKN15a]. This direction, the testing analog of a similar movement in distribution learning [CDSS13, CDSS14b, CDSS14a, ADLS15], aims at abstracting the minimal assumptions that are shared by a large variety of problems, and giving algorithms that can be used for any of these problems. In this work, we make significant progress in this direction by providing a unified framework for the question of testing various properties of probability distributions. More specifically, we describe a generic technique to obtain upper bounds on the sample complexity of this question, which applies to a broad range of structured classes. Our technique yields sample near-optimal and computationally efficient testers for a wide ran...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.