GMQN: A Reference-Based Method for Correcting Batch Effects and Probe Bias in HumanMethylation BeadChip

Xiong, Zhuang; Li, Mengwei; Ma, Yingke; Li, Rujiao; Bào, Yīmíng

doi:10.3389/fgene.2021.810985

Cited by 14 publications

(14 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To test for the possibility that age, sex, race or BMI act as confounders, we measured their correlation eGFR and IF in this dataset. Additionally, we examined the correlation of top CpG sites with age and BMI in this and the independent NGDC-CNCB (Xiong et al 2021) dataset. In all cases, correlation to age, sex and BMI was weak or negligible, excluding them as confounders ( Supp.…”

Section: Resultsmentioning

confidence: 99%

“…After noticing the unique methylation pattern of cg10832035, we sought to test if this is a general phenomenon. Using the NGDC-CNCB (Xiong et al 2021) dataset, we identified 427 methylation sites that exhibit a unique methylation pattern in the kidney ( Supp. Table 7 ).…”

Section: Resultsmentioning

confidence: 99%

“…Of these, 67% showed a positive correlation and 33% a negative correlation. To test for the possibility that age and BMI act as confounders, we measured the correlation of several top CpG sites to age and BMI in this dataset, as well as the independent NGDC-CNCB 30 dataset. In all cases, correlation to age and BMI was weak or negligible, excluding them as confounders.…”

Section: Multiple Cpg Sites Show Strong Correlation To Egfr and Ifmentioning

confidence: 99%

“…Uniquely methylated sites were defined as such that the average methylation level in kidney is lower by at least 0.2 than the 5% quantile level of all other samples in the NGDC-CNCB 30 dataset, or higher by at least 0.2 than the 95% quantile level of all other samples. This distinguished methylation sites that out of the 28 tissues in the dataset, have a unique signature in the kidney, but did not exclude samples where a very small minority of samples leaks throughout the entire range.…”

Section: Identification Of Uniquely Methylated Sitesmentioning

confidence: 99%

See 3 more Smart Citations

Kidney-specific methylation patterns correlate with kidney function and are lost upon kidney disease progression

Sagy

Meyrom

Beckerman

et al. 2022

Preprint

View full text Add to dashboard Cite

Chronological and biological age correlate with DNA methylation levels at specific sites in the genome. Linear combinations of multiple methylation sites, termed epigenetic clocks, can inform us of the chronological age and predict multiple health-related outcomes. However, why some sites correlate with lifespan, healthspan, or specific medical conditions remains poorly understood. Kidney fibrosis is the common pathway for Chronic Kidney Disease (CKD) which affects 10% of Europe and USA population. Currently estimated glomerular filtration rate (eGFR) is the common diagnostic measure. Here we identify epigenetic clocks and methylation sites that correlate with kidney function. Moreover, we identify methylation sites that have a unique methylation signature in the kidney. Methylation levels in the majority of these sites correlates with kidney state and function. When kidney function deteriorates, as measured by interstitial fibrosis (IF) on kidney biopsy and by eGFR, all of these sites regress towards the common methylation pattern observed in other tissues. Interestingly, while the majority of sites are less methylated in the kidney and become more methylated with loss of function, a fraction of the sites are highly methylated in the kidney and become less methylated when kidney function declines. These methylation sites are enriched for specific transcription factor binding sites. In a large subset of sites, changes in methylation pattern are accompanied by changes in gene expression in kidneys of chronic kidney disease patients. These results support the information theory of aging, and the hypothesis that the unique tissue identity, as captured by methylation patterns, is lost as tissue function declines. However, this information loss is not random, but guided towards a baseline that is dependent on the genomic loci.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Multiple Cpg Sites Show Strong Correlation To Egfr and Ifmentioning

confidence: 99%

Section: Identification Of Uniquely Methylated Sitesmentioning

confidence: 99%

See 2 more Smart Citations

Kidney-specific methylation patterns correlate with kidney function and are lost upon kidney disease progression

Sagy

Meyrom

Beckerman

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…GMQN [ 28 ] and IMPUTE [ 29 ] were used to correct the batch effect and fill missing values. The sex chromosome and SNP-related probes were filtered before differential methylation analysis.…”

Section: Methodsmentioning

confidence: 99%

Identification of COVID-19-Associated DNA Methylation Variations by Integrating Methylation Array and scRNA-Seq Data at Cell-Type Resolution

Wang

Xiong

Yang

et al. 2022

Genes

Self Cite

View full text Add to dashboard Cite

Single-cell transcriptome studies have revealed immune dysfunction in COVID-19 patients, including lymphopenia, T cell exhaustion, and increased levels of pro-inflammatory cytokines, while DNA methylation plays an important role in the regulation of immune response and inflammatory response. The specific cell types of immune responses regulated by DNA methylation in COVID-19 patients will be better understood by exploring the COVID-19 DNA methylation variation at the cell-type level. Here, we developed an analytical pipeline to explore single-cell DNA methylation variations in COVID-19 patients by transferring bulk-tissue-level knowledge to the single-cell level. We discovered that the methylation variations in the whole blood of COVID-19 patients showed significant cell-type specificity with remarkable enrichment in gamma-delta T cells and presented a phenomenon of hypermethylation and low expression. Furthermore, we identified five genes whose methylation variations were associated with several cell types. Among them, S100A9, AHNAK, and CX3CR1 have been reported as potential COVID-19 biomarkers previously, and the others (TRAF3IP3 and LFNG) are closely associated with the immune and virus-related signaling pathways. We propose that they might serve as potential epigenetic biomarkers for COVID-19 and could play roles in important biological processes such as the immune response and antiviral activity.

show abstract

Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation

Sidorenko,

Pushkov,

Sakip

et al. 2024

npj Aging

View full text Add to dashboard Cite

Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this study, we introduce Precious2GPT, a multimodal architecture that integrates Conditional Diffusion (CDiffusion) and decoder-only Multi-omics Pretrained Transformer (MoPT) models trained on gene expression and DNA methylation data. Precious2GPT excels in synthetic data generation, outperforming Conditional Generative Adversarial Networks (CGANs), CDiffusion, and MoPT. We demonstrate that Precious2GPT is capable of generating representative synthetic data that captures tissue-and age-specific information from real transcriptomics and methylomics data. Notably, Precious2GPT surpasses other models in age prediction accuracy using the generated data, and it can generate data beyond 120 years of age. Furthermore, we showcase the potential of using this model in identifying gene signatures and potential therapeutic targets in a colorectal cancer case study.Biological synthetic data generation in the context of omics refers to the creation of artificial datasets that mimic the characteristics of real biological data, particularly in genomics, transcriptomics, proteomics, and other high-throughput biological technologies 1 . Generating synthetic data is valuable for various reasons, including the development and validation of computational methods, protection of privacy in sensitive datasets, and augmentation of limited real-world data. Generative adversarial networks (GANs) have been introduced as unique models to generate synthetic genomic data, ranging from DNA sequences to bulk RNA-seq data 2,3 . Copula-based methods are other examples of classical statistical approaches in generating synthetic omics data, especially microarray gene expression data 4 . Moreover, Diffusion models are a recent addition to deep learning for synthetic data generation by simulating a diffusion process, which gradually transforms a simple noise distribution into the target data distribution 5 . Large language models (LLMs), exemplified by Generative Pre-trained Transformer 2 (GPT-2), have also garnered substantial interest built upon the Transformer architectures, capturing their significant contributions to the analysis of sequential data, and capabilities in modeling and advanced language understanding, generation and prediction 6 . Although these models have shown promising results in

show abstract

GMQN: A Reference-Based Method for Correcting Batch Effects and Probe Bias in HumanMethylation BeadChip

Cited by 14 publications

References 34 publications

Kidney-specific methylation patterns correlate with kidney function and are lost upon kidney disease progression

Kidney-specific methylation patterns correlate with kidney function and are lost upon kidney disease progression

Identification of COVID-19-Associated DNA Methylation Variations by Integrating Methylation Array and scRNA-Seq Data at Cell-Type Resolution

Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation

Contact Info

Product

Resources

About