We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where p, n → ∞ and p/n → γ ∈ (0, ∞), and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio γ. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.
This paper is concerned with an important matrix condition in compressed sensing known as the restricted isometry property (RIP). We demonstrate that testing whether a matrix satisfies RIP is NP-hard. As a consequence of our result, it is impossible to efficiently test for RIP provided P = NP.
We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer’s disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer’s disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.
Summary Factor analysis and principal component analysis are used in many application areas. The first step, choosing the number of components, remains a serious challenge. Our work proposes improved methods for this important problem. One of the most popular state of the art methods is parallel analysis (PA), which compares the observed factor strengths with simulated strengths under a noise‐only model. The paper proposes improvements to PA. We first derandomize it, proposing deterministic PA, which is faster and more reproducible than PA. Both PA and deterministic PA are prone to a shadowing phenomenon in which a strong factor makes it difficult to detect smaller but more interesting factors. We propose deflation to counter shadowing. We also propose to raise the decision threshold to improve estimation accuracy. We prove several consistency results for our methods, and test them in simulations. We also illustrate our methods on data from the human genome diversity project, where they significantly improve the accuracy.
Consider an n × p data matrix X whose rows are independently sampled from a population with covariance Σ. When n, p are both large, the eigenvalues of the sample covariance matrix are substantially different from those of the true covariance. Asymptotically, as n, p → ∞ with p/n → γ, there is a deterministic mapping from the population spectral distribution (PSD) to the empirical spectral distribution (ESD) of the eigenvalues. The mapping is characterized by a fixed-point equation for the Stieltjes transform.We propose a new method to compute numerically the output ESD from an arbitrary input PSD. Our method, called Spectrode, finds the support and the density of the ESD to high precision; we prove this for finite discrete distributions. In computational experiments it outperforms existing methods by several orders of magnitude in speed and accuracy. We apply Spectrode to compute expectations and contour integrals of the ESD. These quantities are often central in applications of random matrix theory (RMT).We illustrate that Spectrode is directly useful in statistical problems, such as estimation and hypothesis testing for covariance matrices. Our proposal may make it more convenient to use asymptotic RMT in aspects of high-dimensional data analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.