We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After developing a general theory we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures.MSC 2010 subject classifications: Primary 62F12, 62F25; secondary 62F35, 62J02.
One of the most widely used properties of the multivariate Gaussian distribution, besides its tail behavior, is the fact that conditional means are linear and that conditional variances are constant. We here show that this property is also shared, in an approximate sense, by a large class of non-Gaussian distributions. We allow for several conditioning variables and we provide explicit non-asymptotic results, whereby we extend earlier findings of Hall and Li [7] and Leeb [13].
Recently, several authors have re-examined the power of the classical F -test in a non-Gaussian linear regression under a "large-p, largen" framework [e.g. 27, 29]. They highlight the loss of power as the number of regressors p increases relative to sample size n. These papers essentially focus only on the overall test of the null hypothesis that all p slope coefficients are equal to zero. Here, we consider the general case of testing q linear hypotheses on the p + 1-dimensional regression parameter vector that includes p slope coefficients and an intercept parameter. In the case of Gaussian design, we describe the dependence of the local asymptotic power function on both the relative number of parameters p/n and the relative number of hypotheses q/n being tested, showing that the negative effect of dimensionality is less severe if the number of hypotheses is small. Using the recent work of Srivastava and Vershynin [23] on high-dimensional sample covariance matrices we are also able to substantially generalize previous results for non-Gaussian regressors.MSC 2010 subject classifications: Primary 62F03, 62F05; secondary 62J05, 60F05, 60B20.
Local differential privacy has recently received increasing attention from the statistics community as a valuable tool to protect the privacy of individual data owners without the need of a trusted third party. Similar to the classic notion of randomized response, the idea is that data owners randomize their true information locally and only release the perturbed data. Many different protocols for such local perturbation procedures can be designed. In all the estimation problems studied in the literature so far, however, no significant difference in terms of minimax risk between purely non-interactive protocols and protocols that allow for some amount of interaction between individual data providers could be observed. In this paper we show that for estimating the integrated square of a density, sequentially interactive procedures improve substantially over the best possible non-interactive procedure in terms of minimax rate of estimation. In particular, in the non-interactive scenario we identify an elbow in the minimax rate at s = 3 4 , whereas in the sequentially interactive scenario the elbow is at s = 1 2 . This is markedly different from both, the case of direct observations, where the elbow is well known to be at s = 1 4 , as well as from the case where Laplace noise is added to the original data, where an elbow at s = 9 4 is obtained. The fact that a particular locally differentially private, but interactive, mechanism improves over the simple non-interactive one is also of great importance for practical implementations of local differential privacy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.