2020
DOI: 10.1093/bioinformatics/btaa152
|View full text |Cite
|
Sign up to set email alerts
|

Fast and robust ancestry prediction using principal component analysis

Abstract: Motivation Population stratification (PS) is a major confounder in genome-wide association studies (GWAS) and can lead to false-positive associations. To adjust for PS, principal component analysis (PCA)-based ancestry prediction has been widely used. Simple projection (SP) based on principal component loadings and the recently developed data augmentation, decomposition and Procrustes (ADP) transformation, such as LASER and TRACE, are popular methods for predicting PC scores. However, the pre… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 36 publications
(27 citation statements)
references
References 15 publications
0
27
0
Order By: Relevance
“…To project PCs of a reference dataset (e.g. 1000G) to a target genotype dataset, we implement the following 3 steps in function of package bigsnpr: 1) matching the variants of each dataset, including removing ambiguous alleles [A/T] and [C/G], and matching strand and direction of the alleles; 2) computing PCA of the reference dataset using the matched variants only; 3) projecting computed PCs to the target data using an optimised implementation (see Supplementary Materials) of the Online Augmentation, Decomposition, and Procrustes (OADP) transformation (Zhang et al 2019). To project individuals from the same dataset as the ones used for computing PCA, you can use function .…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To project PCs of a reference dataset (e.g. 1000G) to a target genotype dataset, we implement the following 3 steps in function of package bigsnpr: 1) matching the variants of each dataset, including removing ambiguous alleles [A/T] and [C/G], and matching strand and direction of the alleles; 2) computing PCA of the reference dataset using the matched variants only; 3) projecting computed PCs to the target data using an optimised implementation (see Supplementary Materials) of the Online Augmentation, Decomposition, and Procrustes (OADP) transformation (Zhang et al 2019). To project individuals from the same dataset as the ones used for computing PCA, you can use function .…”
Section: Methodsmentioning
confidence: 99%
“…We implement an optimised version of the Online Augmentation, Decomposition, and Procrustes (OADP) transformation when using K′′ = K′ = K (Zhang et al 2019). We assume that the K -partial Singular Value Decomposition (SVD) of the reference matrix X (of size n × p ) has been computed as U ∆ V T .…”
Section: Supplementary Materialsmentioning
confidence: 99%
See 1 more Smart Citation
“…Including PCs that capture LD as covariates in genetic analyses can lead to reduced power for detecting genetic associations within these LD regions ( Zou et al , 2010 ). Second, another issue may arise when projecting a new study dataset to the PCA space computed from a reference dataset: projected PCs are shrunk toward 0 in the new dataset ( Lee et al , 2010 ; Wang et al , 2015 ; Zhang et al , 2020 ). This shrinkage makes it potentially dangerous to use the projected PCs for analyses, such as PC regression, ancestry detection and correction for ancestry.…”
Section: Introductionmentioning
confidence: 99%
“…Secondly, dimensionality reduction reserves the most contributing features of high-dimensional data, removing noise and inconsequential features, thereby achieving the goal of improving data processing speed. Principal component analysis (PCA) [35] is a widely used method of dimensionality reduction of high-dimensional data while minimizing information loss [36]. Suppose the data set is matrix X = [x 1 , x 2 , x 3 , · · · , x n ], then the process of centralizing the data set matrix would be:…”
Section: Classification Of Reconstructed Xasmentioning
confidence: 99%