The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates p is of the same order or larger than the number of observations n. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: (1) The regularized risk is non-smooth; (2) The distance between the estimator θ and the true parameters vector θ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail.On the other hand, the Lasso estimator can be precisely characterized in the regime in which both n and p are large and n/p is of order one. This characterization was first obtained in the case of standard Gaussian designs, and subsequently generalized to other high-dimensional estimation procedures.Here we extend the same characterization to Gaussian correlated designs with non-singular covariance structure. This characterization is expressed in terms of a simpler "fixed-design" model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals θ in a suitable sparsity class and values of the regularization parameter.As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.
We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength λ > 1 (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a finite number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy.We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any λ > 1, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large λ. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit.Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.
In high-dimensional regression, we attempt to estimate a parameter vector β 0 ∈ R p from n p observations {(y i , x i )} i≤n where x i ∈ R p is a vector of predictors and y i is a response variable. A well-estabilished approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate β, while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood.We show that, in general, a large gap exists between the best performance achieved by any convex regularizer and the optimal statistical error. Remarkably, we demonstrate that this gap is generic as soon as we try to incorporate very simple structural information about the empirical distribution of the entries of β 0 . Our results follow from a detailed study of standard Gaussian designs, a setting that is normally considered particularly friendly to convex regularization schemes such as the Lasso. We prove a lower bound on the estimation error achieved by any convex regularizer which is invariant under permutations of the coordinates of its argument. This bound is expected to be generally tight, and indeed we prove tightness under certain conditions. Further, it implies a gap with respect to Bayes-optimal estimation that can be precisely quantified and persists if the prior distribution of the signal β 0 is known to the statistician.Our results provide rigorous evidence towards a broad conjecture regarding computationalstatistical gaps in high-dimensional estimation. Contents
We study a class of deterministic flows in R d×k , parametrized by a random matrix X ∈ R n×d with i.i.d. centered subgaussian entries. We characterize the asymptotic behavior of these flows over bounded time horizons, in the high-dimensional limit in which n, d → ∞ with k fixed and converging aspect ratios n/d → δ. The asymptotic characterization we prove is in terms of a system of a nonlinear stochastic process in k dimensions, whose parameters are determined by a fixed point condition. This type of characterization is known in physics as dynamical mean field theory. Rigorous results of this type have been obtained in the past for a few spin glass models. Our proof is based on time discretization and a reduction to certain iterative schemes known as approximate message passing (AMP) algorithms, as opposed to earlier work that was based on large deviations theory and stochastic processes theory. The new approach allows for a more elementary proof and implies that the high-dimensional behavior of the flow is universal with respect to the distribution of the entries of X.As specific applications, we obtain high-dimensional characterizations of gradient flow in some classical models from statistics and machine learning, under a random design assumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.