2020
DOI: 10.3389/fgene.2020.585029
|View full text |Cite
|
Sign up to set email alerts
|

A Novel XGBoost Method to Identify Cancer Tissue-of-Origin Based on Copy Number Variations

Abstract: The discovery of cancer of unknown primary (CUP) is of great significance in designing more effective treatments and improving the diagnostic efficiency in cancer patients. In the study, we develop an appropriate machine learning model for tracing the tissue of origin of CUP with high accuracy after feature engineering and model evaluation. Based on a copy number variation data consisting of 4,566 training cases and 1,262 independent validation cases, an XGBoost classifier is applied to 10 types of cancer. Ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 32 publications
0
19
0
Order By: Relevance
“…In previous studies, these models have been shown to be of great value in medical imaging (5)(6)(7)(8). In oncology, they are used in the diagnosis of malignant tumors (9,10), prediction of clinical burden before and after cancer surgery (11), and prediction of adverse reactions to adjuvant therapy (12), among other applications. They have also been shown to be effective in the prognostic prediction of malignant tumors (13,14).…”
Section: Introductionmentioning
confidence: 99%
“…In previous studies, these models have been shown to be of great value in medical imaging (5)(6)(7)(8). In oncology, they are used in the diagnosis of malignant tumors (9,10), prediction of clinical burden before and after cancer surgery (11), and prediction of adverse reactions to adjuvant therapy (12), among other applications. They have also been shown to be effective in the prognostic prediction of malignant tumors (13,14).…”
Section: Introductionmentioning
confidence: 99%
“…While in this work we focused on tumor transcriptome data which can be measured with high precision over a wide dynamic range of transcript abundances by RNA-seq, we note that TCGA datasets of tumor somatic mutations and copy number alteration events are also available (Hutter and Zenklusen, 2018). Given the voluminous literature on the use of tumor somatic genomic data for precision cancer diagnosis (Mitchel et al , 2019; Zhang et al , 2020; Lee et al , 2019), tumor DNA datasets are fertile ground for developing a semi-supervised, multi-omics model for predicting response to chemotherapy.…”
Section: Discussionmentioning
confidence: 99%
“…While in this work we focused on tumor transcriptome data which can be measured with high precision over a wide dynamic range of transcript abundances by RNA-seq, we note that TCGA datasets of tumor somatic mutations and copy number alteration events are also available [17]. Given the voluminous literature on the use of tumor somatic genomic data for precision cancer diagnosis [37][38][39], tumor DNA datasets are fertile ground for developing a semi-supervised, multi-omics model for predicting response to chemotherapy. Second, for decision tree-based response-to-chemotherapy prediction, the performance of VAE-encoded transcriptome features is somewhat sensitive to the type of normalization used for the gene expression levels (data not shown).…”
Section: Discussionmentioning
confidence: 99%