Imputing missing RNA-seq data from DNA methylation by using transfer learning based neural network

Zhou, Xiang; Chai, Hua; Zhao, Huiying; Luo, Ching‐Hsing; Yang, Yuedong

doi:10.1101/803692

Cited by 6 publications

(6 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It reliably imputes RNA-seq data by making use of external data from DNA methylation probe datasets. TDimpute (Zhou X. et al, 2019) is a deep neural network (DNN)-based transfer learning approach that imputes missing gene expression data using DNA methylation datasets. It employs a DNN model to recover missing gene expression data by constructing a non-linear mapping between DNA methylation data and gene expression data.…”

Section: Integrating Epigenomic and Transcriptomic Datamentioning

confidence: 99%

“…Since gene-trait associations are mostly detected in strongly relevant tissues, it is recommended to use trait-relevant tissues in order to boost the correlation between GReX of related tissues (Zhang et al, 2019). For the TDimpute model, it can be further improved by integrating prior biological knowledge regarding the gene-gene interaction factors in order to reduce the parameters of the DNN model (Zhou X. et al, 2019).…”

Section: Integrating Epigenomic and Transcriptomic Datamentioning

confidence: 99%

See 1 more Smart Citation

A Review of Integrative Imputation for Multi-Omics Datasets

et al. 2020

View full text Add to dashboard Cite

Section: Integrating Epigenomic and Transcriptomic Datamentioning

confidence: 99%

Section: Integrating Epigenomic and Transcriptomic Datamentioning

confidence: 99%

A Review of Integrative Imputation for Multi-Omics Datasets

et al. 2020

View full text Add to dashboard Cite

“…The datasets and pretrained pan-cancer models supporting the results of this article are available in the Synapse with ID: syn21438134 [ 44 ]. Snapshots of our code and data further supporting this work are openly available in the GigaScience repository, GigaDB [ 45 ].…”

Section: Availability Of Supporting Data and Materialsmentioning

confidence: 99%

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network

et al. 2020

Self Cite

View full text Add to dashboard Cite

Abstract Background Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets. Results Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning–based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7–11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation–driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. Conclusions TDimpute is an effective method for RNA-seq imputation with limited training samples.

show abstract

“…Furthermore, to impute entirely missing expression profiles, one must incorporate additional domain information, such as data from nearby time points or functional relationships between genes' expression patterns. In another example, transfer learning has been used to impute entire bulk RNA-sequencing profiles when methylation profiles for the same samples are available (Zhou et al, 2020). Here, we use expression profiles of related drugs and cells.…”

Section: Introductionmentioning

confidence: 99%

Cell-specific imputation of drug connectivity mapping with incomplete data

Sapashnik

Newman

Pietras

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation: Drug re-positioning allows expedited discovery of new applications for existing compounds, but re-screening vast compound libraries is often prohibitively expensive. "Connectivity mapping" is a process that links drugs to diseases by identifying drugs whose impact on expression in a collection of cells most closely reverses the disease's impact on expression in disease-relevant tissues. The high throughput LINCS project has expanded the universe of compounds, cellular perturbations, and cell types for which data are available, but even with this effort, many potentially clinically useful combinations are missing. To evaluate the possibility of finding disease-relevant drug connectivity despite missing data, we compared methods using cross-validation on a complete subset of the LINCS data. Results: Modified recommender systems with either neighborhood-based or SVD imputation methods were compared to autoencoders and two naive methods. All were evaluated for accuracy in prediction of both expression signatures and connectivity query responses. We demonstrate that cellular context is important, and that it is possible to predict cell-specific drug responses with improved accuracy over naive approaches. Neighborhood-based collaborative filtering was the most successful, improving prediction accuracy in all tested cells. We conclude that even for cells in which drug responses have not been fully characterized, it is possible to identify drugs that reverse the expression signatures observed in disease.

show abstract

Imputing missing RNA-seq data from DNA methylation by using transfer learning based neural network

Cited by 6 publications

References 35 publications

A Review of Integrative Imputation for Multi-Omics Datasets

A Review of Integrative Imputation for Multi-Omics Datasets

Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning–based neural network

Cell-specific imputation of drug connectivity mapping with incomplete data

Contact Info

Product

Resources

About