We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset S ⊆ [N ] of the domain [N ] of the unknown probability distribution D and returns a draw from the conditional probability distribution D restricted to S. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive.We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether D is the uniform distribution U; testing whether D = D * for an explicitly provided D * ; testing whether two unknown distributions D 1 and D 2 are equivalent; and estimating the variation distance between D and the uniform distribution. At a high level our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have Ω( √ N ) sample complexity in the standard model (and in some cases the complexity must be almost linear in N ), we give poly(log N, 1/ǫ)-query algorithms (and in some cases poly(1/ǫ)-query algorithms independent of N ) for all these problems in our conditional sampling setting. * ccanonne@cs.columbia.edu, Columbia University.
A central server needs to perform statistical inference based on samples that are distributed over multiple users who can each send a message of limited length to the center. We study problems of distribution learning and identity testing in this distributed inference setting and examine the role of shared randomness as a resource. We propose a generalpurpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning. This general strategy turns out to be sample-optimal even for distribution testing among private-coin protocols. Interestingly, we propose a public-coin protocol that outperforms simulate-and-infer for distribution testing and is, in fact, sampleoptimal. Underlying our public-coin protocol is a random hash that when applied to the samples minimally contracts the chisquared distance of their distribution to the uniform distribution.
We study the problem of testing conditional independence for discrete distributions. Specifically, given samples from a discrete random variable, we want to distinguish, with probability at least 2/3, between the case that X and Y are conditionally independent given Z from the case that (X, Y, Z) is ε-far, in 1 -distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of X and Y are binary.The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity for discrete distributions over [ 1 ] × [ 2 ] × [n]. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when 1 , 2 = O(1), we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is *
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.