Summary
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
Genetic risk factors often localize to noncoding regions of the genome with unknown effects on disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory mechanisms underlying these genetic associations. Knowledge of the context that determines the nature and strength of eQTLs may help identify cell types relevant to pathophysiology and the regulatory networks underlying disease. Here we generated peripheral blood RNA-seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers. Of the 23,060 significant cis-regulated genes (false discovery rate (FDR) ≤ 0.05), 2,743 (12%) showed context-dependent eQTL effects. The majority of these effects were influenced by cell type composition. A set of 145 cis-eQTLs depended on type I interferon signaling. Others were modulated by specific transcription factors binding to the eQTL SNPs.
The methylome is subject to genetic and environmental effects. Their impact may depend on sex and age, resulting in sex- and age-related physiological variation and disease susceptibility. Here we estimate the total heritability of DNA methylation levels in whole blood and estimate the variance explained by common single nucleotide polymorphisms at 411,169 sites in 2,603 individuals from twin families, to establish a catalogue of between-individual variation in DNA methylation. Heritability estimates vary across the genome (mean=19%) and interaction analyses reveal thousands of sites with sex-specific heritability as well as sites where the environmental variance increases with age. Integration with previously published data illustrates the impact of genome and environment across the lifespan at methylation sites associated with metabolic traits, smoking and ageing. These findings demonstrate that our catalogue holds valuable information on locations in the genome where methylation variation between people may reflect disease-relevant environmental exposures or genetic variation.
Most disease-associated genetic variants are noncoding, making it challenging to design experiments to understand their functional consequences. Identification of expression quantitative trait loci (eQTLs) has been a powerful approach to infer the downstream effects of disease-associated variants, but most of these variants remain unexplained. The analysis of DNA methylation, a key component of the epigenome, offers highly complementary data on the regulatory potential of genomic regions. Here we show that disease-associated variants have widespread effects on DNA methylation in trans that likely reflect differential occupancy of trans binding sites by cis-regulated transcription factors. Using multiple omics data sets from 3,841 Dutch individuals, we identified 1,907 established trait-associated SNPs that affect the methylation levels of 10,141 different CpG sites in trans (false discovery rate (FDR) < 0.05). These included SNPs that affect both the expression of a nearby transcription factor (such as NFKB1, CTCF and NKX2-3) and methylation of its respective binding site across the genome. Trans methylation QTLs effectively expose the downstream effects of disease-associated variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.