Polygenic risk scores (PRS) based on European training data suffer reduced accuracy in non-European target populations, exacerbating health disparities. This loss of accuracy predominantly stems from LD differences, MAF differences (including population-specific SNPs), and/or causal effect size differences. PRS based on training data from the non-European target population do not suffer from these limitations, but are currently limited by much smaller training sample sizes. Here, we propose PolyPred, a method that improves cross-population polygenic prediction by combining two complementary predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing LD differences; and BOLT-LMM, a published predictor. In the special case where a large training sample is available in the non-European target population (or a closely related population), we propose PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. PolyPred and PolyPred+ require individual-level training data (for their BOLT-LMM component), but we also propose analogous methods that replace the BOLT-LMM component with summary statistic-based components if only summary statistics are available. We applied PolyPred to 49 diseases and complex traits in 4 UK Biobank populations using UK Biobank British training data (average N=325K), and observed statistically significant average relative improvements in prediction accuracy vs. BOLT-LMM ranging from +7% in South Asians to +32% in Africans (and vs. LDpruning + P-value thresholding (P+T) ranging from +77% to +164%), consistent with simulations. We applied PolyPred+ to 23 diseases and complex traits in UK Biobank East Asians using both UK Biobank British (average N=325K) and Biobank Japan (average N=124K) training data, and observed statistically significant average relative improvements in prediction accuracy of +24% vs. BOLT-LMM and +12% vs. PolyPred. The summary statistic-based analogues of PolyPred and PolyPred+ attained similar improvements. In conclusion, PolyPred and PolyPred+ improve cross-population polygenic prediction accuracy, ameliorating health disparities.
Marfan syndrome (MFS) is a rare autosomal dominant connective tissue disorder related to variants in the FBN1 gene. Prognosis is related to aortic risk of dissection following aneurysm. MFS clinical variability is notable, for age of onset as well as severity and number of clinical manifestations. To identify genetic modifiers, we combined genome-wide approaches in 1070 clinically well-characterized FBN1 disease-causing variant carriers: (1) an FBN1 eQTL analysis in 80 fibroblasts of FBN1 stop variant carriers, (2) a linkage analysis, (3) a kinship matrix association study in 14 clinically concordant and discordant sib-pairs, (4) a genome-wide association study and (5) a whole exome sequencing in 98 extreme phenotype samples.Three genetic mechanisms of variability were found. A new genotype/phenotype correlation with an excess of loss-of-cysteine variants (P = 0.004) in severely affected subjects. A second pathogenic event in another thoracic aortic aneurysm gene or the COL4A1 gene (known to be involved in cerebral aneurysm) was found in nine individuals. A polygenic model involving at least nine modifier loci (named gMod-M1-9) was observed through cross-mapping of results. Notably, gMod-M2 which co-localizes with PRKG1, in which activating variants have already been described in thoracic aortic aneurysm, and gMod-M3 co-localized with a metalloprotease (proteins of extra-cellular matrix regulation) cluster. Our results represent a major advance in understanding the complex genetic architecture of MFS and provide the first steps toward prediction of clinical evolution.
Correspondence should be addressed to P.-R.L. (poruloh@broadinstitute.org) or A.L.P. (aprice@hsph.harvard.edu).Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association methodcapable of running analyses of the full UK Biobank cohort in a few days on a single compute node-and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples-a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers. 1. CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/194944 doi: bioRxiv preprint first posted online Sep. 27, 2017; To the Editor:Despite recent work highlighting the advantages of linear mixed model (LMM) methods for genome-wide association studies in data sets containing relatedness or population structure [1][2][3], much uncertainty remains about best practices for optimizing GWAS power while controlling confounders. Several recent studies of the interim UK Biobank data set [4] (∼150,000 samples) removed >20% of samples by applying filters for relatedness or genetic ancestry, and/or used linear regression in preference to mixed model association. These issues are exacerbated in the full UK Biobank data set (∼500,000 samples), in which suggested sample exclusions decrease sample size by nearly 30% [5]. Here, we release a much faster version of our BOLT-LMM Bayesian mixed model association method [3] and show that it can be applied with minimal sample exclusions and achieves greatly superior power compared to common practices for analyzing UK Biobank data.In analyses of 23 highly heritable UK Biobank phenotypes (Supplementary Table 1 Table 2). These gains in power were driven only partially by the increased number of samples analyzed; we observed that BOLT-LMM achieved effective sample sizes as high as ∼700,000 by conditioning on polygenic predictions from genome-wide SNPs, which effectively reduces noise in an association test [2,3,6] (Fig. 1b, Supplementary Fig. 1, and Supplementary Table 3). (We estimated effective sam...
The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.