2023
DOI: 10.1073/pnas.2302028120
|View full text |Cite
|
Sign up to set email alerts
|

Fundamental limits in structured principal component analysis and how to reach them

Abstract: How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide cha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(4 citation statements)
references
References 48 publications
0
4
0
Order By: Relevance
“…We overcome this issue by developing a new approach for GWAS inference, dubbed genomic Vector Approximate Message Passing (gVAMP). Approximate Message Passing (AMP) [11][12][13] refers to a family of iterative algorithms with several attractive properties: (i) AMP allows the usage of a wide range of Bayesian priors; (ii) the AMP performance for high-dimensional data can be precisely characterized by a simple recursion called state evolution [14]; (iii) using state evolution, joint association test statistics can be obtained [15]; and (iv) AMP achieves Bayes-optimal performance in several settings [15][16][17]. However, we find that existing AMP algorithms proposed for various applications [18][19][20][21] cannot be transferred to biobank analyses as: (i) they are entirely infeasible at scale, requiring expensive singular value decompositions; and (ii) they give diverging estimates of the signal in either simulated genomic data or the UK Biobank data.…”
Section: Overview Of the Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…We overcome this issue by developing a new approach for GWAS inference, dubbed genomic Vector Approximate Message Passing (gVAMP). Approximate Message Passing (AMP) [11][12][13] refers to a family of iterative algorithms with several attractive properties: (i) AMP allows the usage of a wide range of Bayesian priors; (ii) the AMP performance for high-dimensional data can be precisely characterized by a simple recursion called state evolution [14]; (iii) using state evolution, joint association test statistics can be obtained [15]; and (iv) AMP achieves Bayes-optimal performance in several settings [15][16][17]. However, we find that existing AMP algorithms proposed for various applications [18][19][20][21] cannot be transferred to biobank analyses as: (i) they are entirely infeasible at scale, requiring expensive singular value decompositions; and (ii) they give diverging estimates of the signal in either simulated genomic data or the UK Biobank data.…”
Section: Overview Of the Approachmentioning
confidence: 99%
“…The fundamental limits of inference have been precisely characterized, and it has been shown that efficient algorithms, such as Approximate Message Passing (AMP), can meet such limits [8,9]. Specifically, AMP represents a family of iterative algorithms that has been applied to a range of statistical estimation problems, including 1 linear regression [10,11], generalised linear models [12][13][14], and low-rank matrix estimation [9,15].…”
Section: Introductionmentioning
confidence: 99%
“…The spectral properties of low rank perturbations of highrank matrices (such as the Wigner matrix Z) are by now largely understood in random matrix theory, and they can give rise to the celebrated BBP carry out a thorough study of carry out a thorough study of transition [16], further studied and extended in [17][18][19][20][21][22][23][24]. Thanks to the effort of a wide interdisciplinary community, we also have a control on the asymptotic behaviour of the posterior measure (2) and an exact formula for the free entropy associated to the lowrank problem [25][26][27][28][29][30][31][32] (recently extended to rotational invariant noise [33]), which yields the Bayes-optimal limit of the noise allowing the reconstruction of the low-rank spike. Finally, a particular class of algorithms, known as approximate message passing (AMP) [34][35][36][37][38], is able to perform factorization up to this Bayes-optimal limit.…”
Section: Introductionmentioning
confidence: 99%
“…The spectral properties of low rank perturbations of high-rank matrices (such as the Wigner matrix Z) are by now largely understood in random matrix theory, and they can give rise to the celebrated BBP carry out a thorough study of carry out a thorough study of transition [16], further studied and extended in [17][18][19][20][21][22][23][24]. Thanks to the effort of a wide interdisciplinary community, we also have a control on the asymptotic behaviour of the posterior measure (2) and an exact formula for the free entropy associated to the low-rank problem [25][26][27][28][29][30][31][32] (recently extended to rotational invariant noise [33]), which yields the Bayes-optimal limit of the noise allowing the reconstruction of the low-rank spike. Finally, a particular class of algorithms, known as Approximate Message Passing (AMP) [34][35][36][37][38], is able to perform factorization up to this Bayes-optimal limit.…”
Section: Introductionmentioning
confidence: 99%