Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests. We propose a new strategy to perform RVAT on WGS data: "RAVA-FIRST" (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the gnomAD populations, which are referred to as "CADD regions". (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with subscores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 enriched for rare variants in early-onset patients. This region that was missed by standard sliding windows procedures is included in a TAD region that contains a
Major depessive disorder (MDD), bipolar disorder (BD) and schizophrenia (SCZ) are accompanied by an increased risk of cardiovascular diseases including venous thromboembolism (VTE). Reasons for this are complex, and include obesity, smoking and use of hormone and psychotropic medications. Genetic studies increasingly provide evidence of shared genetic risk of psychiatric and cardiometabolic illness. This study aimed to determine whether genetic predisposition to MDD, BD or SCZ was associated with an increased risk of VTE. Genetic correlations using the largest genome-wide genetic meta-analyses summary statistics for MDD, BD and SCZ (Psychiatric Genetics Consortium) and a recent genome-wide genetic meta-analysis of VTE (INVENT consortium) demonstrated a positive association between VTE and MDD but not BD or SCZ. The same summary statistics were used to construct polygenic risk scores for MDD, BD and SCZ in UK Biobank participants of self-reported white British ancestry. These were assessed for impact on self-reported VTE risk (10786 cases, 285124 controls), using logistic regression, in sex-specific and sex-combined analyses. We identified significant positive associations between polygenic risk for MDD and risk of VTE in men, women and sex-combined analyses, independent of known risk factors. Secondary analyses demonstrated that this association was not driven by those with lifetime experience of mental illness. Meta-analyses of individual data from six additional independent cohorts replicated the sex-combined association. This report provides evidence for shared biological mechanisms leading to MDD and VTE, and suggests that, in the absence of genetic data, family history for MDD might be considered when assessing risk of VTE.
Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests.We propose a new strategy to perform RVAT on WGS data: “RAVA-FIRST” (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the GnomAD populations, which are referred to as “CADD regions”. (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 that is enriched for rare variants in early-onset patients and that was that was missed by standard sliding windows procedures.RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages.Author SummaryTechnological progresses have made possible whole genome sequencing at an unprecedented scale, opening up the possibility to explore the role of genetic variants of low frequency in common diseases. The challenge is now methodological and requires the development of novel methods and strategies to analyse sequencing data that are not limited to assessing the role of coding variants. With RAVA-FIRST, we propose a novel strategy to investigate the role of rare variants in the whole-genome that takes benefit from biological information. Especially, RAVA-FIRST relies on testing units that go beyond genes to gather rare variants in the association tests. In this work, we show that this new strategy presents several advantages compared to existing methods. RAVA-FIRST offers an easy and straightforward analysis of genome-wide rare variants, especially the intergenic ones which are frequently left behind, making it a promising tool to get a better understanding of the biology of complex diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.