Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs), but it remains unclear whether this overlap is driven by gene expression levels mediating genetic effects on disease. Here we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis-genetic component of gene expression levels. We applied MESC to GWAS summary statistics for 42 traits (average N = 323K) and cis-eQTL summary statistics for 48 tissues from the Genotype-Tissue Expression (GTEx) consortium. Averaging across traits, only 11±2% of heritability was mediated by assayed gene expression levels. Expression-mediated heritability was enriched in genes with evidence of selective constraint and genes with disease-appropriate annotations. Our results demonstrate that assayed bulk-tissue eQTLs, though disease relevant, cannot explain the majority of disease heritability.
Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs). However, it remains unclear whether this overlap is driven by mediation of genetic effects on disease by expression levels, or whether it primarily reflects pleiotropic relationships instead. Here we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis-genetic component of assayed steady-state gene expression levels, using summary association statistics from GWAS and eQTL studies. We show that MESC produces robust estimates of expression-mediated heritability across a wide range of simulations. We applied MESC to GWAS summary statistics for 42 diseases and complex traits (average N = 323K) and cis-eQTL data across 48 tissues from the GTEx consortium. We determined that a statistically significant but low proportion of disease heritability (mean estimate 11% with S.E. 2%) is mediated by the cis-genetic component of assayed gene expression levels, with substantial variation across diseases (point estimates from 0% to 38%). We further partitioned expression-mediated heritability across various gene sets. We observed an inverse relationship between cis-heritability of expression and disease heritability mediated by expression, suggesting that genes with weaker eQTLs have larger causal effects on disease. Moreover, we observed broad patterns of expression-mediated heritability enrichment across functional gene sets that implicate specific gene sets in disease, including loss-of-function intolerant genes and FDA-approved drug targets. Our results demonstrate that eQTLs estimated from steady-state expression levels in bulk tissues are informative of regulatory disease mechanisms, but that such eQTLs are insufficient to explain the majority of disease heritability. Instead, additional assays are necessary to more fully capture the regulatory effects of GWAS variants.
Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be costineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.
Background: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. Results: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. Conclusions: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.