Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N ¼ 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N ¼ 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, Functionally Informed Novel Discovery Of Risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N =130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p-values that do not use functional data. We replicated the novel loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N =416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.83 enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.
While the cohort level accuracy of polygenic risk score has been widely assessed, uncertainty in PRS—estimates of genetic value at the individual level remains underexplored. Here we show that Bayesian PRS methods can estimate the variance of an individual’s PRS and can yield well-calibrated credible intervals with posterior sampling. For real traits in the UK Biobank (N=291,273 unrelated “white British”) we observe large variance in individual PRS estimates which impacts interpretation of PRS-based stratification; averaging across 13 traits, only 0.8% (s.d. 1.6%) of individuals with PRS point estimates in the top decile have their entire 95% credible intervals fully contained in the top decile. We provide an analytical estimator for expected individual PRS variance—a function of SNP-heritability, number of causal SNPs, and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
1Although recent studies provide evidence for a common genetic basis between complex traits 2 and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific 3 manner remains elusive. Here, we quantify the overlap of genes identified through large-scale 4 genome-wide association studies (GWAS) for 62 complex traits and diseases with genes known 5 to cause 20 broad categories of Mendelian disorders. We identify a significant enrichment of 6 phenotypically-matched Mendelian disorder genes in GWAS gene sets. Further, we observe 7 elevated GWAS effect sizes near phenotypically-matched Mendelian disorder genes. Finally, we 8 report examples of GWAS variants localized at the transcription start site or physically 9 interacting with the promoters of phenotypically-matched Mendelian disorder genes. Our results 1 0 are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are 1 1 dysregulated by noncoding variants in complex traits, and demonstrate how leveraging findings 1 2 from related Mendelian disorders and functional genomic datasets can prioritize genes that are 1 3 putatively dysregulated by local and distal non-coding GWAS variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.