Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely techniques, in particular in Principal Component Analysis (PCA). In many modern data analysis problems, statisticians are faced with large datasets where the sample size, n, is of the same order of magnitude as the number of variables p. Random matrix theory predicts that in this context, the eigenvalues of the sample covariance matrix are not good estimators of the eigenvalues of the population covariance.We propose to use a fundamental result in random matrix theory, the Marčenko-Pastur equation, to better estimate the eigenvalues of large dimensional covariance matrices. The Marčenko-Pastur equation holds in very wide generality and under weak assumptions. The estimator we obtain can be thought of as "shrinking" in a non linear fashion the eigenvalues of the sample covariance matrix to estimate the population eigenvalue. Inspired by ideas of random matrix theory, we also suggest a change of point of view when thinking about estimation of high-dimensional vectors: we do not try to estimate directly the vectors but rather a probability measure that describes them. We think this is a theoretically more fruitful way to think about these problems.Our estimator gives fast and good or very good results in extended simulations. Our algorithmic approach is based on convex optimization. We also show that the proposed estimator is consistent. * Acknowledgements: The author is grateful to Alexandre d'Aspremont, Peter Bickel, Laurent El Ghaoui, Elizabeth Purdom, John Rice, Saharon Rosset and Bin Yu for stimulating discussions and comments at various stages of this project. Support from NSF grant DMS-0605169 is gratefully acknowledged. AMS 2000 SC: Primary 62H12, Secondary 62-09.
We consider the asymptotic fluctuation behavior of the largest eigenvalue of certain sample covariance matrices in the asymptotic regime where both dimensions of the corresponding data matrix go to infinity. More precisely, let X be an n × p matrix, and let its rows be i.i.d. complex normal vectors with mean 0 and covariance Σp. We show that for a large class of covariance matrices Σp, the largest eigenvalue of X * X is asymptotically distributed (after recentering and rescaling) as the Tracy-Widom distribution that appears in the study of the Gaussian unitary ensemble. We give explicit formulas for the centering and scaling sequences that are easy to implement and involve only the spectral distribution of the population covariance, n and p.The main theorem applies to a number of covariance models found in applications. For example, well-behaved Toeplitz matrices as well as covariance matrices whose spectral distribution is a sum of atoms (under some conditions on the mass of the atoms) are among the models the theorem can handle. Generalizations of the theorem to certain spiked versions of our models and a.s. results about the largest eigenvalue are given. We also discuss a simple corollary that does not require normality of the entries of the data matrix and some consequences for applications in multivariate statistics.
Estimating covariance matrices is a problem of fundamental importance in multivariate statistics. In practice it is increasingly frequent to work with data matrices X of dimension n × p, where p and n are both large. Results from random matrix theory show very clearly that in this setting, standard estimators like the sample covariance matrix perform in general very poorly.In this "large n, large p" setting, it is sometimes the case that practitioners are willing to assume that many elements of the population covariance matrix are equal to 0, and hence this matrix is sparse. We develop an estimator to handle this situation. The estimator is shown to be consistent in operator norm, when, for instance, we have p ≍ n as n → ∞. In other words the largest singular value of the difference between the estimator and the population covariance matrix goes to zero. This implies consistency of all the eigenvalues and consistency of eigenspaces associated to isolated eigenvalues.We also propose a notion of sparsity for matrices, that is, "compatible" with spectral analysis and is independent of the ordering of the variables.
We place ourselves in the setting of high-dimensional statistical inference where the number of variables $p$ in a dataset of interest is of the same order of magnitude as the number of observations $n$. We consider the spectrum of certain kernel random matrices, in particular $n\times n$ matrices whose $(i,j)$th entry is $f(X_i'X_j/p)$ or $f(\Vert X_i-X_j\Vert^2/p)$ where $p$ is the dimension of the data, and $X_i$ are independent data vectors. Here $f$ is assumed to be a locally smooth function. The study is motivated by questions arising in statistics and computer science where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in high-dimensions, and for the models we analyze, the problem becomes essentially linear--which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in practice.Comment: Published in at http://dx.doi.org/10.1214/08-AOS648 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both large, but p ≤ n. We find an exact stochastic representation for the distribution ofβ = argmin β∈R p ∑ n i=1 ρðY i − X i ′βÞ at fixed p and n under various assumptions on the objective function ρ and our statistical model. A scalar random variable whose deterministic limit r ρ ðκÞ can be studied when p=n → κ > 0 plays a central role in this representation. We discover a nonlinear system of two deterministic equations that characterizes r ρ ðκÞ. Interestingly, the system shows that r ρ ðκÞ depends on ρ through proximal mappings of ρ as well as various aspects of the statistical model underlying our study. Several surprising results emerge. In particular, we show that, when p=n is large enough, least squares becomes preferable to least absolute deviations for double-exponential errors.prox function | high-dimensional statistics | concentration of measure I n the "classical" period up to the 1980s, research on regression models focused on situations for which the number of covariates p was much smaller than n, the sample size. Least-squares regression (LSE) was the main fitting tool used, but its sensitivity to outliers came to the fore with the work of Tukey, Huber, Hampel, and others starting in the 1950s.Given the model Y i = X i ′β 0 + e i and M-estimation methods described in the Abstract, it follows from the discussion in ref. 1 (p. 170, for instance) that, if the design matrix X (an n × p matrix whose ith row is X i ) is nonsingular, under various regularity conditions on X, ρ, ψ = ρ′ and the [independent identically dis-,β is asymptotically normal with mean β 0 and covariance matrix Cðρ; eÞðX′XÞ −1 . Here, Cðρ; eÞ = Eðψ 2 ðeÞÞ=½Eðψ′ðeÞÞ 2 and « has the same distribution as e i 's. It follows that, for p fixed, the relative efficiency of M-estimates such as least absolute deviations (LAD), to LSE, does not depend on the design matrix. Thus, LAD has the same advantage over LSE for heavy-tailed distributions as the median has over the mean.In recent years, there has been great focus on the case where p and n are commensurate and large. Greatest attention has been paid to the "sparse" case where the number of nonzero coefficients is much smaller than n or p. This has been achieved by adding an ℓ 1 type of penalty to the quadratic objective function of LSE, in the case of the Least Absolute Shrinkage and Selection Operator (LASSO). Unfortunately, these types of methods result in biased estimates of the coefficients, and statistical inference, as opposed to prediction, becomes problematic.Huber (2) was the first to investigate the regime of large p (p → ∞ with n). His results were followed up by Portnoy (3) under weaker conditions [see also Bloomfield (4)]. Huber showed that the behavior found for fixed p persisted in regions such as p 2 =n → 0 and p 3 =n → 0. That is, estimates of coefficients and contrasts were asymptotically Gaussian and relative efficiencies of methods did not depend o...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.