2020
DOI: 10.1007/s40484-020-0226-1
|View full text |Cite
|
Sign up to set email alerts
|

Performance‐weighted‐voting model: An ensemble machine learning method for cancer type classification using whole‐exome sequencing mutation

Abstract: BackgroundWith improvements in next‐generation DNA sequencing technology, lower cost is needed to collect genetic data. More machine learning techniques can be used to help with cancer analysis and diagnosis.MethodsWe developed an ensemble machine learning system named performance‐weighted‐voting model for cancer type classification in 6,249 samples across 14 cancer types. Our ensemble system consists of five weak classifiers (logistic regression, SVM, random forest, XGBoost and neural networks). We first used… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 79 publications
0
13
0
Order By: Relevance
“…Furthermore, from their research soft voting model had the overall accuracy output comparatively lesser than that of the performance model. However, from the present research the soft-voting ensemble model performed much better as compared the performance model, due to the three classifiers that were used (as mentioned in " Weighted ensemble learning classifier " section), being able to distinguish and give better probability values as compared to the five weak classifiers used in Li et al [ 71 ]. The model designed in the present work also resulted in much larger true positives, and hence a better method for the early prediction of 5 classes of cancer as mentioned in " Data clean-up and obtaining a derived dataset " section.…”
Section: Discussionmentioning
confidence: 60%
See 2 more Smart Citations
“…Furthermore, from their research soft voting model had the overall accuracy output comparatively lesser than that of the performance model. However, from the present research the soft-voting ensemble model performed much better as compared the performance model, due to the three classifiers that were used (as mentioned in " Weighted ensemble learning classifier " section), being able to distinguish and give better probability values as compared to the five weak classifiers used in Li et al [ 71 ]. The model designed in the present work also resulted in much larger true positives, and hence a better method for the early prediction of 5 classes of cancer as mentioned in " Data clean-up and obtaining a derived dataset " section.…”
Section: Discussionmentioning
confidence: 60%
“…In Li et al [ 71 ], the reported overall accuracy was 71.46% for the classification of 14 types of cancer class with the use of performance weighted voting ensemble on five classifiers, logistic regression, support vector machine, random forest, XGBoost and neural networks. From Table 7 , the overall weighted accuracy for 8-cancer types calculated for the five classifiers mentioned above, was well below 70% [ 71 ]. Only the performance weighted voting ensemble model resulted in an overall accuracy of 71.46 [ 71 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, we have included a feature in the software that allows for stringing together of consecutive operon pairs into multi-gene operons. Other applications in genomics where ensemble methods have proven very useful include annotation of genomic islands, detection of genomic mutations, and gene expression-based phenotype prediction [50][51][52][53]. The development of these flexible methods is critical for weathering the natural and technical variation between organisms and data sets, which we can see even between the data sets that we chose to analyze in this study.…”
Section: Discussionmentioning
confidence: 99%
“…We also provide the code required to re-train our models as data acquisition evolves and novel sequencing data types emerge, which given the statistical front-end transformation, should be broadly applicable. Other applications in genomics where ensemble methods have proven very useful include annotation of genomic islands, detection of genomic mutations, and gene expression-based phenotype prediction [44][45][46][47] . The development of these flexible methods is critical for weathering the natural and technical variation between organisms and data sets, which we can see even between the data sets that we chose to analyze in this study.…”
Section: Discussionmentioning
confidence: 99%