Deep learning-based pan-cancer classification model reveals cancer-specific gene expression signatures

Divate, Mayur; Tyagi, Aayush Kumar; Richard, Derek J.; Prasad, Prathosh A.; Gowda, Harsha; Nagaraj, Shivashankar H.

doi:10.1101/2021.03.15.435283

Cited by 2 publications

(9 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are some limitations associated with our pan-cancer analysis-based assessment of cancer tissue-of-origin specific gene expression signatures [ 31 ]. For example, we only considered genes that were expressed at sufficiently high levels (≥5 FPKM) in at least 50% of samples within a cancer type.…”

Section: Discussionmentioning

confidence: 99%

Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures

Divate

Tyagi

Richard

et al. 2022

Cancers

Self Cite

View full text Add to dashboard Cite

Cancer tissue-of-origin specific biomarkers are needed for effective diagnosis, monitoring, and treatment of cancers. In this study, we analyzed transcriptomics data from 37 cancer types provided by The Cancer Genome Atlas (TCGA) to identify cancer tissue-of-origin specific gene expression signatures. We developed a deep neural network model to classify cancers based on gene expression data. The model achieved a predictive accuracy of >97% across cancer types indicating the presence of distinct cancer tissue-of-origin specific gene expression signatures. We interpreted the model using Shapley additive explanations to identify specific gene signatures that significantly contributed to cancer-type classification. We evaluated the model and the validity of gene signatures using an independent test data set from the International Cancer Genome Consortium. In conclusion, we present a robust neural network model for accurate classification of cancers based on gene expression data and also provide a list of gene signatures that are valuable for developing biomarker panels for determining cancer tissue-of-origin. These gene signatures serve as valuable biomarkers for determining tissue-of-origin for cancers of unknown primary.

show abstract

Section: Discussionmentioning

confidence: 99%

Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures

Divate

Tyagi

Richard

et al. 2022

Cancers

Self Cite

View full text Add to dashboard Cite

show abstract

“…The prediction accuracy varies by tumor type, with some tumor types being more frequently mispredicted. Patterns of more frequent misclassifications among groups of cancers arising from the same organ (e.g., kidney renal clear cell carcinoma, kidney renal papillary cell carcinoma and kidney chromophobe carcinoma, or lung adenocarcinoma and lung squamous cell carcinoma), and/or among cancers represented by a small number of samples in the training set (e.g., cholangiocarcinoma, which is frequently predicted as liver hepatocellular carcinoma and vice versa), as noted in multiple studies (Bagge et al, 2018;Lyu and Haque, 2018;Bavafaye Haghighi et al, 2019;De Guia et al, 2019;Grewal et al, 2019;Zhao et al, 2020;Vibert et al, 2021;Divate et al, 2022;Jones et al, 2022;Moiso et al, 2022). All of this implies that the distribution of different cancer types in the training set is one of the key factors contributing to the prediction accuracy of the model.…”

Section: Performance Of Models For Tissue-oforigin Predictionmentioning

confidence: 99%

“…The majority of studies were based on deep learning, using neural networks of different architectures (Lyu and Haque, 2018;Azarkhalili et al, 2019;De Guia et al, 2019;He et al, 2020b;Mostavi et al, 2020;Zhao et al, 2020;Vibert et al, 2021;Divate et al, 2022;Hong et al, 2022;Jones et al, 2022;Moiso et al, 2022). Several studies utilized ensemble learning methods, in which the final prediction is a combination of multiple predictors (Grewal et al, 2019;He et al, 2020a;Ramroach et al, 2020;Chen et al, 2021;Liu et al, 2021).…”

Section: Machine Learning In Cancer Of Unknown Primary Classification...mentioning

confidence: 99%

“…The remaining studies used independent test sets consisting of 5-18 cancer types and showed a reduction in prediction accuracy ranging from 5.8% to 26.47% compared to cross-validation (with a mean reduction in accuracy of 13.32%). Out of these, two studies used metastatic samples (Bavafaye Haghighi et al, 2019;Zhao et al, 2020), two used a mixture of primary and metastatic samples (Bagge et al, 2018;Divate et al, 2022) and one did not report the exact source of the independent test set (Chen et al, 2021).…”

Section: Performance Of Models For Tissue-oforigin Predictionmentioning

confidence: 99%

“…Interestingly, when testing the accuracy of the same model on test sets composed of both primary and metastatic samples, metastatic samples showed a lower prediction accuracy. For example, Divate et al (2022) reported 88.10% accuracy for metastatic samples compared to 92.13% for primary samples. Bagge et al (2018) found 53.84% accuracy for metastatic samples compared to 96.67% for patient-derived xenografts of primary cancer and 100% for primary cancer.…”

Section: Performance Of Models For Tissue-oforigin Predictionmentioning

confidence: 99%

See 2 more Smart Citations

Machine learning for pan-cancer classification based on RNA sequencing data

Štancl,

Karlić

2023

Front. Mol. Biosci.

View full text Add to dashboard Cite

Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements.

show abstract

Deep learning-based pan-cancer classification model reveals cancer-specific gene expression signatures

Cited by 2 publications

References 37 publications

Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures

Deep Learning-Based Pan-Cancer Classification Model Reveals Tissue-of-Origin Specific Gene Expression Signatures

Machine learning for pan-cancer classification based on RNA sequencing data

Contact Info

Product

Resources

About