Common variants implicated by genome-wide association studies (GWAS) of complex diseases are known to be enriched for coding and regulatory variants. We applied methods to partition the heritability explained by genotyped SNPs (h 2 g ) across functional categories (while accounting for shared variance due to linkage disequilibrium) to genotype and imputed data for 11 common diseases. DNaseI Hypersensitivity Sites (DHS) from 218 cell-types, spanning 16% of the genome, explained an average of 79% of h 2 g (5.1× enrichment; P < 10 −20 ); further enrichment was observed at enhancer and cell-type specific DHS elements. The enrichments were much smaller in analyses that did not use imputed data or were restricted to GWASassociated SNPs. In contrast, coding variants, spanning 1% of the genome, explained only 8% of h 2 g (13.8× enrichment; P = 5 × 10 −4 ). We replicated these findings but found no significant contribution from rare coding variants in an independent schizophrenia cohort genotyped on GWAS and exome chips.Recent work by ENCODE and other projects has shown that specific classes of non-coding variants can have complex and diverse impacts on cell function and phenotype 1-7 . With many potentially informative functional categories and competing biological hypotheses, quantifying the contribution of variants in these categories to complex traits would inform trait biology and focus fine-mapping. The availability of significantly associated variants from hundreds of genome-wide association studies (GWAS) 8 has opened one avenue for quantifying enrichment. Indeed, 11% of GWAS hits lie in coding regions 8 and 57% of GWAS hits lie in broadly-defined DHS (spanning 42% of the genome) 5 , with additional GWAS hits tagging these regions. The full distribution of GWAS association statistics exhibits enriched P-values in coding and untranslated regions (UTR) 9 . Analysis of DHS sub-classes and other histone marks has revealed a complex pattern of cell-type specific relationships with known disease associations 4 . However, the question of how much each functional category contributes to disease heritability remains unanswered 10 .Here, we jointly estimate the heritability explained by all SNPs (h 2 g ) in different functional categories, generalizing recent work using variance-component methods [11][12][13][14][15][16][17] . In contrast to analyses of top GWAS hits, this approach leverages the entire polygenic architecture of each trait and can obtain accurate estimates even in the face of pervasive linkage disequilibrium (LD) across functional categories, as we show via extensive simulations. We apply this approach to functional categories in GWAS and exome chip data from > 50, 000 samples.1
The low portability of polygenic scores (PGS) across global populations is a major concern that must be addressed before PGS can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGS are transferable between ancestries by deriving polygenic scores for 240 curated traits from the UK Biobank data and applying them in eight ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the eight ancestry groups at a high-resolution, country-specific level, based on a simple, robust and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 240 phenotypes, and show a systematic and dramatic reduction in portability of PGS trained in the inferred ancestral UK population and applied to the inferred ancestral Polish - Italian - Iranian - Indian - Chinese - Caribbean - Nigerian populations, respectively. These analyses, performed at a finer scale than the usual continental scale, demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to PC distance, even when all individuals reside in the same country and are genotyped and phenotyped as part of the same cohort. Our study provides high-resolution and robust insights into the PGS portability problem and offers clues towards possible solutions.
Polygenic risk scores (PRSs) are expected to play a critical role in achieving precision medicine. PRS predictors are generally based on linear models using summary statistics, and more recently individual- level data. However, these predictors generally only capture additive relationships and are limited when it comes to what type of data they use. Here, we develop a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), we specifically designed for large scale genomics data. The framework supports multi-task (MT) learning, automatic integration of clinical and biochemical data and model explainability. GLN outperforms LASSO for a wide range of diseases, particularly autoimmune disease which have been researched for interaction effects. We showcase the flexibility of the framework by training one MT model to predict 338 diseases simultaneously. Furthermore, we find that incorporating measurement data for PRSs improves performance for virtually all (93%) diseases considered (ROC-AUC improvement up to 0.36) and that including genotype data provides better model calibration compared to measurements alone. We use the framework to analyse what our models learn and find that they learn both relevant disease variants and clinical measurements. EIR is open source and available at https://github.com/arnor-sigurdsson/EIR.
We present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.