Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.(Extended Data Fig. 5). These results show the value of large sample sizes in blood to detect trans-mQTLs regardless of the tissue. Trans-mQTL SNPs and DNAm exhibit patterned TF binding.Recent studies have uncovered multiple types of transcription factor (TF)-DNA interactions influenced by DNAm, including the binding of DNAm-sensitive TFs [26][27][28] and cooperativity between TFs 27,29 . To gain insights into how SNPs induce long-range DNAm changes, we mapped enrichments for DNAm sites and SNPs across binding sites for 171 TFs in 27 cell types 30,31 . We found strong enrichments for most TFs and cell types among DNAm sites with a trans association (cis + trans: 55%; trans only: 80%; cis only: 18%) and among cis-acting SNPs (cis only: 96%, cis + trans: 91%, trans only: 1%; Fig. 2b, Supplementary Tables 7 and 8, and Supplementary Figs. 22 and 23). Consistent with the observation that trans-only DNAm sites are enriched for CpG islands (Supplementary Fig. 13), DNAm sites that overlap TF-binding sites (TFBSs) were relatively hypomethylated (weighted mean DNAm levels = 21% versus 52%, P < 2.2 × 10 −16 ; Supplementary Fig. 24).Next, we hypothesized that, if a trans-mQTL is driven by TF activity 8,10 , then particular TF-TF pairs may exhibit preferential enrichment 32 . An mQTL has a pair of TFBS annotations 31 , one for the SNP and one for the DNAm site. We evaluated whether the annotation pairs among 18,584 interchromosomal trans-mQTLs were associated with TF binding in a nonrandom pattern (Supplementary Note and Extended Data Fig. 6a,b). We found that 6.1% (22,962 of 378,225) of possible pairwise combinations of SNP-DNAm site annotations were more over-or underrepresented than expected by chance after strict multiple testing correction (Supplementary Note, Supplementary Table 9 and Extended Data Fig. 6c).After accounting for abundance and other characteristics, the strongest pairwise enrichments involved sites close to TFBSs for proteins in the cohesin complex, ...
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40–50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10–20% (14–24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
BackgroundTobacco smoking is a risk factor for multiple diseases, including cardiovascular disease and diabetes. Many smoking-associated signals have been detected in the blood methylome, but the extent to which these changes are widespread to metabolically relevant tissues, and impact gene expression or metabolic health, remains unclear.MethodsWe investigated smoking-associated DNA methylation and gene expression variation in adipose tissue biopsies from 542 healthy female twins. Replication, tissue specificity, and longitudinal stability of the smoking-associated effects were explored in additional adipose, blood, skin, and lung samples. We characterized the impact of adipose tissue smoking methylation and expression signals on metabolic disease risk phenotypes, including visceral fat.ResultsWe identified 42 smoking-methylation and 42 smoking-expression signals, where five genes (AHRR, CYP1A1, CYP1B1, CYTL1, F2RL3) were both hypo-methylated and upregulated in current smokers. CYP1A1 gene expression achieved 95% prediction performance of current smoking status. We validated and replicated a proportion of the signals in additional primary tissue samples, identifying tissue-shared effects. Smoking leaves systemic imprints on DNA methylation after smoking cessation, with stronger but shorter-lived effects on gene expression. Metabolic disease risk traits such as visceral fat and android-to-gynoid ratio showed association with methylation at smoking markers with functional impacts on expression, such as CYP1A1, and at tissue-shared smoking signals, such as NOTCH1. At smoking-signals, BHLHE40 and AHRR DNA methylation and gene expression levels in current smokers were predictive of future gain in visceral fat upon smoking cessation.ConclusionsOur results provide the first comprehensive characterization of coordinated DNA methylation and gene expression markers of smoking in adipose tissue. The findings relate to human metabolic health and give insights into understanding the widespread health consequence of smoking outside of the lung.Electronic supplementary materialThe online version of this article (10.1186/s13148-018-0558-0) contains supplementary material, which is available to authorized users.
Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.