We consider the identity testing problem -or goodness-of-fit testing problemin multivariate binomial families, multivariate Poisson families and multinomial distributions. Given a known distribution p and n i.i.d. samples drawn from an unknown distribution q, we investigate how large > 0 should be to distinguish, with high probability, the case p D q from the case d.p; q/, where d denotes a specific distance over probability distributions. We answer this question in the case of a family of different distances: d.p; q/ D kp qk t for t 2 OE1; 2, where k k t is the entrywise `t norm. Besides being locally minimax-optimal -i.e. characterizing the detection threshold in dependence of the known matrix p -our tests have simple expressions and are easily implementable.
We consider the goodness-of fit testing problem for Hölder smooth densities over R d : given n iid observations with unknown density p and given a known density p 0 , we investigate how large ρ should be to distinguish, with high probability, the case p = p 0 from the composite alternative of all Hölder-smooth densities p such that p − p 0 t ≥ ρ where t ∈ [1, 2]. The densities are assumed to be defined over R d and to have Hölder smoothness parameter α > 0. In the present work, we solve the case α ≤ 1 and handle the case α > 1 using an additional technical restriction on the densities. We identify matching upper and lower bounds on the local minimax rates of testing, given explicitly in terms of p 0 . We propose novel test statistics which we believe could be of independent interest. We also establish the first definition of an explicit cutoff u B allowing us to split R d into a bulk part (defined as the subset of R d where p 0 takes only values greater than or equal to u B ) and a tail part (defined as the complementary of the bulk), each part involving fundamentally different contributions to the local minimax rates of testing.
Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is an almost unexplored topic. We consider the problem of estimating a discrete distribution in total variation from n contaminated data batches under a local differential privacy constraint. A fraction 1 − of the batches contain k i.i.d. samples drawn from a discrete distribution p over d elements. To protect the users' privacy, each of the samples is privatized using an α-locally differentially private mechanism. The remaining n batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be / √ k + d/kn, up to a log(1/ ) factor. Under the privacy constraint alone, the minimax rate of estimation is d 2 /α 2 kn. We show that combining the two constraints leads to a minimax estimation rate of d/α 2 k + d 2 /α 2 kn up to a log(1/ ) factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.
Given a heterogeneous Gaussian sequence model with unknown mean θ ∈ R d and known covariance matrix Σ = diag(σ 2 1 , . . . , σ 2 d ), we study the signal detection problem against sparse alternatives, for known sparsity s. Namely, we characterize how large ǫ * > 0 should be, in order to distinguish with high probability the null hypothesis θ = 0 from the alternative composed of s-sparse vectors in R d , separated from 0 in L t norm (t ≥ 1) by at least ǫ * . We find minimax upper and lower bounds over the minimax separation radius ǫ * and prove that they are always matching. We also derive the corresponding minimax tests achieving these bounds. Our results reveal new phase transitions regarding the behavior of ǫ * with respect to the level of sparsity, to the L t metric, and to the heteroscedasticity profile of Σ. In the case of the Euclidean (i.e. L 2 ) separation, we bridge the remaining gaps in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.