Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
De novo variants (DNVs) with deleterious effects have proved informative in identifying risk genes for early-onset diseases such as congenital heart disease (CHD). A number of statistical methods have been proposed for family-based studies or case/control studies to identify risk genes by screening genes with more DNVs than expected by chance in Whole Exome Sequencing (WES) studies. However, the statistical power is still limited for cohorts with thousands of subjects. Under the hypothesis that connected genes in protein-protein interaction (PPI) networks are more likely to share similar disease association status, we developed a Markov Random Field model that can leverage information from publicly available PPI databases to increase power in identifying risk genes. We identified 46 candidate genes with at least 1 DNV in the CHD study cohort, including 18 known human CHD genes and 35 highly expressed genes in mouse developing heart. Our results may shed new insight on the shared protein functionality among risk genes for CHD.
Background: Genome-wide association studies (GWAS) have succeeded in identifying tens of thousands of genetic variants associated with complex human traits during the past decade, however, they are still hampered by limited statistical power and difficulties in biological interpretation. With the recent progress in expression quantitative trait loci (eQTL) studies, transcriptome-wide association studies (TWAS) provide a framework to test for gene-trait associations by integrating information from GWAS and eQTL studies. Results: In this review, we will introduce the general framework of TWAS, the relevant resources, and the computational tools. Extensions of the original TWAS methods will also be discussed. Furthermore, we will briefly introduce methods that are closely related to TWAS, including MR-based methods and colocalization approaches. Connection and difference between these approaches will be discussed. Conclusion: Finally, we will summarize strengths, limitations, and potential directions for TWAS.
BACKGROUND: Ischemic stroke (IS) is a highly heritable trait, and genome-wide association studies have identified several commonly occurring susceptibility risk loci for this condition. However, there are limited data on the contribution of rare genetic variation to IS. METHODS: We conducted an exome-wide study using whole-exome sequencing data from 152 058 UK Biobank participants, including 1777 IS cases. We performed single-variant analyses for rare variants and gene-based analyses for loss-of-function and deleterious missense rare variants. We validated these results through (1) gene-based testing using summary statistics from MEGASTROKE—a genome-wide association study of IS that included 67 162 IS cases and 454 450 controls, (2) gene-based testing using individual-level data from 1706 IS survivors, including 142 recurrent IS cases, enrolled in the VISP trial (Vitamin Intervention for Stroke Prevention); and (3) gene-based testing against neuroimaging phenotypes related to cerebrovascular disease using summary-level data from 42 310 UK Biobank participants with available magnetic resonance imaging data. RESULTS: In single-variant association analyses, none of the evaluated variants were associated with IS at genome-wide significance levels ( P <5×10 −8 ). In the gene-based analysis focused on loss-of-function and deleterious missense variants, rare genetic variation at CYP2R1 was significantly associated with IS risk ( P =2.6×10 −6 ), exceeding the Bonferroni-corrected threshold for 16 074 tests ( P <3.1×10 −6 ). Validations analyses indicated that CYP2R1 was associated with IS risk in MEGASTROKE (gene-based test, P =0.003), with IS recurrence in the VISP trial (gene-based test, P =0.001) and with neuroimaging traits (white matter hyperintensity, mean diffusivity, and fractional anisotropy) in the UK Biobank neuroimaging study (all gene-based tests, P <0.05). CONCLUSIONS: Because CYP2R1 plays an important role in vitamin D metabolism and existing observational evidence suggests an association between vitamin D levels and cerebrovascular disease, our results support a role of this pathway in the occurrence of IS.
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DMNs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.