Background. Expression quantitative trait methylation (eQTM) analysis identifies DNA CpG sites at which methylation is associated with gene expression and may reveal molecular mechanisms of disease. The present study describes an eQTM resource of CpG-transcript pairs. Methods. DNA methylation was measured in blood samples from 1,045 Framingham Heart Study (FHS) participants using the Illumina 450K BeadChip and in 1,070 FHS participants using the Illumina EPIC array. Blood gene expression data were collected from all 2,115 participants using RNA sequencing (RNA-seq). The association between DNA methylation and gene expression was quantified for all cis (i.e., within 1Mb) and trans (>1Mb) CpG-transcript pairs. Significant results (p<1E-7 for cis and <1E-14 for trans) were subsequently tested for enrichment of biological pathways and of clinical traits. Results. We identified 70,047 significant cis CpG-transcript pairs where the top most significant eGenes (i.e., gene transcripts associated with a CpG) were enriched in biological pathways related to cell signaling, and for 1,208 clinical traits (enrichment false discovery rate [FDR] ≤ 0.05). We also identified 246,667 significant trans CpG-transcript pairs where the top most significant eGenes were enriched in biological pathways related to activation of the immune response, and for 1,191 clinical traits (enrichment FDR ≤ 0.05). Using significant cis CpG-transcript pairs, we identified significant mediation of the association between CpG sites and cardiometabolic traits through gene expression and identified shared genetic regulation between CpGs and transcripts associated with these cardiometabolic traits. Conclusions. We developed a robust and powerful resource of eQTM CpG-transcript pairs that can help inform future functional studies that seek to understand the molecular basis of disease.
Alzheimer's disease (AD) is the leading cause of death among individuals over 65. Despite many AD genetic variants detected by large genome-wide association studies (GWAS) a limited number of causal genes have been confirmed. Conventional machine learning techniques integrate functional annotation data and GWAS signals to assign variants functional relevance probabilities. Yet, a large proportion of genetic variation lies in the non-coding genome, where unsupervised and semi-supervised techniques have demonstrated a greater advantage. Furthermore, cell-type specific approaches are needed to better understand disease etiology. Studying AD from a microglia-specific lens is more likely to reveal causal variants involved in immune pathways. Therefore, in this study, we developed a semi-supervised ensemble approach using microglia-specific data to prioritize non-coding variants and their target genes that play roles in immune-related AD mechanisms. We designed a transductive positive-unlabeled and negative-unlabeled learning model that employs a bagging technique to learn from unlabeled variants, generating multiple predicted probabilities of variant risk. Using a combined homogeneous-heterogeneous ensemble framework, we aggregated the predictions. We applied our model to AD variant data, identifying 11 risk variants acting in well-known AD genes, such as TSPAN14, INPP5D, and MS4A2. These results validated our model's performance and demonstrated a need to study these genes in the context of microglial pathways. We also proposed further experimental study for 37 potential causal variants associated with less-known genes. Our work has utility in predicting AD relevant genes and variants functioning in microglia and can be generalized for application to other complex diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.