The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.Comment: Published at http://dx.doi.org/10.1214/009053606000000281 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
stekhoven@stat.math.ethz.ch; buhlmann@stat.math.ethz.ch
Summary. Estimation of structure, such as in variable selection, graphical modelling or cluster analysis, is notoriously difficult, especially for high dimensional data. We introduce stability selection. It is based on subsampling in combination with (high dimensional) selection algorithms. As such, the method is extremely general and has a very wide range of applicability. Stability selection provides finite sample control for some error rates of false discoveries and hence a transparent principle to choose a proper amount of regularization for structure estimation. Variable selection and structure estimation improve markedly for a range of selection methods if stability selection is applied. We prove for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated. We demonstrate stability selection for variable selection and Gaussian graphical modelling, using real and simulated data.
convergence to the limit is not uniform. Furthermore, bootstrap and even subsampling techniques are plagued by noncontinuity of limiting distributions. Nevertheless, in the low-dimensional setting, a modified bootstrap scheme has been proposed; [13] and [14] have recently proposed a residual based bootstrap scheme. They provide consistency guarantees for the highdimensional setting; we consider this method in an empirical analysis in Section 4.Some approaches for quantifying uncertainty include the following. The work in [50] implicitly contains the idea of sample splitting and corresponding construction of p-values and confidence intervals, and the procedure has been improved by using multiple sample splitting and aggregation of dependent p-values from multiple sample splits [32]. Stability selection [31] and its modification [41] provides another route to estimate error measures for false positive selections in general high-dimensional settings. An alternative method for obtaining confidence sets is in the recent work [29]. From another and mainly theoretical perspective, the work in [24] presents necessary and sufficient conditions for recovery with the lassoβ in terms of β − β 0 ∞ , where β 0 denotes the true parameter: bounds on the latter, which hold with probability at least say 1 − α, could be used in principle to construct (very) conservative confidence regions. At a theoretical level, the paper [35] derives confidence intervals in ℓ 2 for the case of two possible sparsity levels. Other recent work is discussed in Section 1.1 below.We propose here a method which enjoys optimality properties when making assumptions on the sparsity and design matrix of the model. For a linear model, the procedure is as the one in [52] and closely related to the method in [23]. It is based on the lasso and is "inverting" the corresponding KKT conditions. This yields a nonsparse estimator which has a Gaussian (limiting) distribution. We show, within a sparse linear model setting, that the estimator is optimal in the sense that it reaches the semiparametric efficiency bound. The procedure can be used and is analyzed for high-dimensional sparse linear and generalized linear models and for regression problems with general convex (robust) loss functions.1.1. Related work. Our work is closest to [52] who proposed the semiparametric approach for distributional inference in a high-dimensional linear model. We take here a slightly different view-point, namely by inverting the KKT conditions from the lasso, while relaxed projections are used in [52]. Furthermore, our paper extends the results in [52] by: (i) treating generalized linear models and general convex loss functions; (ii) for linear models, we give conditions under which the procedure achieves the semiparametric efficiency bound and our analysis allows for rather general Gaussian, sub-Gaussian and bounded design. A related approach as in [52] was proposed CONFIDENCE REGIONS FOR HIGH-DIMENSIONAL MODELS 3 in [8] based on ridge regression which is clearly suboptimal and ineffi...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.