In a recent article, El Karoui et al. (Proc Natl Acad Sci 110(36):14557-14562, 2013) study the distribution of robust regression estimators in the regime in which the number of parameters p is of the same order as the number of samples n. Using numerical simulations and 'highly plausible' heuristic arguments, they unveil a striking new phenomenon. Namely, the regression coefficients contain an extra Gaussian noise component that is not explained by classical concepts such as the Fisher information matrix. We show here that that this phenomenon can be characterized rigorously using techniques that were developed by the authors for analyzing the Lasso estimator under high-dimensional asymptotics. We introduce an approximate message passing (AMP) algorithm to compute M-estimators and deploy state evolution to evaluate the operating characteristics of AMP and so also M-estimates. Our analysis clarifies that the 'extra Gaussian noise' encountered in this problem is fundamentally similar to phenomena already studied for regularized least squares in the setting n < p.
Mathematics Subject Classification 62F10 · 62F12 · 60F99
M-estimation under high dimensional asymptoticsConsider the traditional linear regression modelB Andrea Montanari 1 We are interested in estimating θ 0 from observed data 2 (Y, X) using a traditional M-estimator, defined by a non-negative convex function ρ : R → R ≥0 :where u, v = m i=1 u i v i is the standard scalar product in R m , and θ is chosen arbitrarily if there is multiple minimizers.Although this is a completely traditional problem, we consider it under highdimensional asymptotics where the number of parameters p and the number of observations n are both tending to infinity, at the same rate. This is becoming a popular asymptotic model owing to the modern awareness of 'big data' and 'data deluge'; but also because it leads to entirely new phenomena.
Extra Gaussian noise due to high-dimensional asymptoticsClassical statistical theory considered the situation where the number of regression parameters p is fixed and the number of samples n is tending to infinity. The asymptotic distribution was found by Huber [2,18] to be normal N(0, V) where the asymptotic variance matrix V is given byhere ψ = ρ is the score function of the M-estimator and V (ψ, F) = ( ψ 2 dF)/ ( ψ dF) 2 the asymptotic variance functional of [17], and (X T X) the usual Gram matrix associated with the least-squares problem. Importantly, it was found that for efficient estimation-i.e. the smallest possible asymptotic variance-the optimal M-estimator depended on the probability distribution F W of the errors W . Choosing ψ(x) = (log f W (x)) (with f W the density of W ), the asymptotic variance functional yields V (ψ, F W ) = 1/I (F W ), with I (F) denoting the Fisher information. This achieves the fundamental limit on the accuracy of M-estimators [18]. In modern statistical practice there is increasing interest in applications where the number of explanatory variables p is very large, and comparable to n. Examples of...