2021
DOI: 10.1101/2021.03.29.437510
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores

Abstract: We present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 65 publications
0
9
0
Order By: Relevance
“…Yet, there are dramatic overall trends as seen in Figure A5. In general, even though the quartiles contain approximately 3 times as many SNPs as the HapMapIII SNP set, when those SNPs are poorly imputed the performance of Vilma suffers substantially, which has also been ovserved in [41]. Indeed for the lowest and second lowest quartiles, we see huge drops in performance when looking across traits and target cohorts (p 10 −16 in both cases with median drops in r of 0.08 and 0.02 respectively).…”
Section: A3 Using Poorly Imputed Variants Can Degrade Performancementioning
confidence: 88%
“…Yet, there are dramatic overall trends as seen in Figure A5. In general, even though the quartiles contain approximately 3 times as many SNPs as the HapMapIII SNP set, when those SNPs are poorly imputed the performance of Vilma suffers substantially, which has also been ovserved in [41]. Indeed for the lowest and second lowest quartiles, we see huge drops in performance when looking across traits and target cohorts (p 10 −16 in both cases with median drops in r of 0.08 and 0.02 respectively).…”
Section: A3 Using Poorly Imputed Variants Can Degrade Performancementioning
confidence: 88%
“…This is in contrast to most of the statistical methods used above where the inclusion of biomarkers relied mostly on previous research on the biomarkers themselves. In addition to feature selection, this sparse approach has been chosen because of previous success with SNP based prediction [ 11 , 54 , 55 , 56 , 57 , 58 , 59 ] and because it has been shown to be among the best ML predictors for genetics and is often a good all around method [ 56 , 60 ].…”
Section: Introductionmentioning
confidence: 99%
“…In other cases, the GWAS summary statistics may be derived from a meta-analysis that combines data from a number of different studies. These settings may present potential mismatches and heterogeneities between of LD reference panel and GWAS summary statistics and are thus challenging to model, often leading to substantial loss in predictive power [29, 30, 68, 69].…”
Section: Resultsmentioning
confidence: 99%
“…Summary statistics-based PRS methods can be sensitive to heterogeneities between GWAS summary statistics and the LD reference panel [8,29,68]. For some Bayesian methods, this mismatch can result in unpredictable behavior, with the posterior mean for the effect sizes ex-ploding in magnitude and the estimated SNP-heritability exceeding 1 in some circumstances 1 .…”
Section: S16 a Heuristic Test Of Mismatch Between Gwas Summary Statis...mentioning
confidence: 99%