15Genetic variants in functional regions of the genome are enriched for complex trait heritabil-16 ity. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages 17 trait-specific functional enrichments to increase prediction accuracy. We fit priors using the 18 recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-19 related annotations. We analytically estimate posterior mean causal e↵ect sizes and then use 20 cross-validation to regularize these estimates, improving prediction accuracy for sparse architec-21 tures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods 22 in simulations using real genotypes. We applied LDpred-funct to predict 21 highly heritable traits 23 in the UK Biobank. We used association statistics from British-ancestry samples as training data 24 (avg N =365K) and samples of other European ancestries as validation data (avg N =22K), to 25 minimize confounding. LDpred-funct attained a +9% relative improvement in average predic-26 tion accuracy (avg prediction R 2 =0.145; highest R 2 =0.413 for height) compared to LDpred (the 27 best method that does not incorporate functional information), consistent with simulations. For 28 height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total N =1107K; 29 higher heritability in UK Biobank cohort) increased prediction R 2 to 0.429. Our results show 30 that modeling functional enrichment improves polygenic prediction accuracy, consistent with the 31 functional architecture of complex traits. 32 Genetic variants in functional regions of the genome are enriched for complex trait heritability 1-6 . 34 In this study, we aim to leverage functional enrichment to improve polygenic prediction 7, eral studies have shown that incorporating prior distributions on causal e↵ect sizes can improve 36 prediction accuracy 9-12 , compared to standard Best Linear Unbiased Prediction (BLUP) or Prun-37 ing+Thresholding methods [13][14][15] . Recent e↵orts to incorporate functional information have produced 38 promising results 16,17 , but may be limited by dichotomizing between functional and non-functional 39 variants 16 or restricting their analyses to genotyped variants 17 . 40Here, we introduce a new method, LDpred-funct, for leveraging trait-specific functional enrich-41 ments to increase polygenic prediction accuracy. We fit functional priors using our recently devel-42 oped baseline-LD model 18 , which includes coding, conserved, regulatory and LD-related annotations. 43LDpred-funct first analytically estimates posterior mean causal e↵ect sizes, accounting for functional 44 priors and LD between variants. LDpred-funct then uses cross-validation within validation samples 45 to regularize causal e↵ect size estimates in bins of di↵erent magnitude, improving prediction accuracy 46 for sparse architectures. We show that LDpred-funct attains higher polygenic prediction accuracy 47 than other methods in simulations with real genotypes, analys...
Understanding the role of rare variants is important in elucidating the genetic basis of human diseases and complex traits. It is widely believed that negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 − p)] α , where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α by maximizing its profile likelihood in a linear mixed model framework using 1 . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/188086 doi: bioRxiv preprint first posted online Sep. 13, 2017; imputed genotypes, including rare variants (MAF > 0.07%). We applied this method to 25 UK Biobank diseases and complex traits (N=113,851). All traits produced negative α estimates with 20 significantly negative, implying larger rare variant effect sizes. The inferred best-fit distribution of true α values across traits had mean −0.38 (s.e. 0.02) and standard deviation 0.08 (s.e. 0.03), with statistically significant heterogeneity across traits (P=0.0014). Despite larger rare variant effect sizes, we show that for most traits analyzed, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability. Using evolutionary modeling and forward simulations, we validated the α model of MAFdependent trait effects and estimated the level of coupling between fitness effects and trait effects. Based on this analysis an average genome-wide negative selection coefficient on the order of 10 −4 or stronger is necessary to explain the α values that we inferred.
Polygenic risk scores derived from genotype data (PRS) and family history of disease (FH) both provide valuable information for predicting disease risk, enhancing prospects for clinical utility. PRS perform poorly when applied to diverse populations, but FH does not suffer this limitation. Here, we explore methods for combining both types of information (PRS-FH). We analyzed 10 complex diseases from the UK Biobank for which family history (parental and sibling history) was available for most target samples. PRS were trained using all British individuals (N=409K), and target samples consisted of unrelated non-British Europeans (N=42K), South Asians (N=7K), or Africans (N=7K). We evaluated PRS, FH, and PRS-FH using liability-scale R2, focusing on three well-powered diseases (type 2 diabetes, hypertension, depression) with R2 > 0.05 for PRS and/or FH in each target population. Averaging across these three diseases, PRS attained average prediction R2 of 5.8%, 4.0%, and 0.53% in non-British Europeans, South Asians, and Africans, confirming poor cross-population transferability. In contrast, PRS-FH attained average prediction R2 of 13%, 12%, and 10%, respectively, representing a large improvement in Europeans and an extremely large improvement in Africans; for each disease and each target population, the improvement was highly statistically significant. PRS-FH methods based on a logistic model and a liability threshold model performed similarly when covariates were not included in predictions (consistent with simulations), but the logistic model outperformed the liability threshold model when covariates were included. In conclusion, including family history greatly improves the accuracy of polygenic risk scores, particularly in diverse populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.