Assessing the impact of transcriptomics data analysis pipelines on downstream functional enrichment results

Paton, Victor; Gabor, Attila; Flores, Ricardo Omar Ramirez; Badia-i-Mompel, Pau; Tanevski, Jovan; Garrido-Rodriguez, Martin; Saez-Rodriguez, Julio

doi:10.1101/2023.09.13.557538

Cited by 2 publications

(1 citation statement)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The RNA-Seq data was preprocessed following a classical bioinformatics pipeline. While different preprocessing options were tested (data and results not shown) as they can impact downstream analyses (Paton et al 2023), we decided to choose a fixed standard choice to keep a reasonable computational budget and compare models all else being equal. For all the experiments without pre-training on external datasets, we selected the 5,000 most variable genes on the training sets, applied a logarithmic operation and normalized the data with mean-standard scaling.…”

Section: Preprocessing Datasets and Gene Selectionmentioning

confidence: 99%

Robust Evaluation of Deep Learning-based Representation Methods for Survival and Gene Essentiality Prediction on Bulk RNA-seq Data

Gross,

Dauvin,

Cabeli

et al. 2024

Preprint

View full text Add to dashboard Cite

Deep learning (DL) has shown potential to provide powerful representations of bulk RNA-seq data in cancer research. However, there is no consensus regarding the impact of design choices of DL approaches on the performance of the learned representation, including the model architecture, the training methodology and the various hyperparameters. To address this problem, we rigorously evaluate the performance of various design choices of DL representation learning methods using public pan-cancer datasets, and assess their predictive power for survival and gene essentiality predictions. We demonstrate that non DL-based baseline methods achieve comparable or superior performance compared to more complex models on survival predictions tasks. DL representation methods, however, are the most efficient to predict the gene essentiality of cell lines. We show that auto-encoders (AE) are consistently improved by techniques such as masking and multi-head training. Our results suggest that the impact of DL representations and of pre-training are highly task- and architecture-dependent, highlighting the need for adopting rigorous evaluation guidelines. These guidelines for robust evaluation are implemented in a pipeline made available to the research community.

show abstract

Section: Preprocessing Datasets and Gene Selectionmentioning

confidence: 99%

Robust Evaluation of Deep Learning-based Representation Methods for Survival and Gene Essentiality Prediction on Bulk RNA-seq Data

Gross,

Dauvin,

Cabeli

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data

Gross,

Dauvin,

Cabeli

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Deep learning (DL) has shown potential to provide powerful representations of bulk RNA-seq data in cancer research. However, there is no consensus regarding the impact of design choices of DL approaches on the performance of the learned representation, including the model architecture, the training methodology and the various hyperparameters. To address this problem, we evaluate the performance of various design choices of DL representation learning methods using TCGA and DepMap pan-cancer datasets and assess their predictive power for survival and gene essentiality predictions. We demonstrate that baseline methods achieve comparable or superior performance compared to more complex models on survival predictions tasks. DL representation methods, however, are the most efficient to predict the gene essentiality of cell lines. We show that auto-encoders (AE) are consistently improved by techniques such as masking and multi-head training. Our results suggest that the impact of DL representations and of pretraining are highly task- and architecture-dependent, highlighting the need for adopting rigorous evaluation guidelines. These guidelines for robust evaluation are implemented in a pipeline made available to the research community.

show abstract

Assessing the impact of transcriptomics data analysis pipelines on downstream functional enrichment results

Cited by 2 publications

References 54 publications

Robust Evaluation of Deep Learning-based Representation Methods for Survival and Gene Essentiality Prediction on Bulk RNA-seq Data

Robust Evaluation of Deep Learning-based Representation Methods for Survival and Gene Essentiality Prediction on Bulk RNA-seq Data

Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data

Contact Info

Product

Resources

About