2019
DOI: 10.3389/fmolb.2019.00044
|View full text |Cite
|
Sign up to set email alerts
|

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

Abstract: Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
1
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(50 citation statements)
references
References 117 publications
0
48
1
1
Order By: Relevance
“…Despite the slightly better accuracy linked to data with one-hot encoding than standard encoding, we found no statistical differences between the two methods. This finding is inconsistent with previous reports,[28, 29] and needs further validation. We also noticed that the minimal depths of trees in our best-fit RF models were usually 1 to 3.…”
Section: Discussioncontrasting
confidence: 99%
See 1 more Smart Citation
“…Despite the slightly better accuracy linked to data with one-hot encoding than standard encoding, we found no statistical differences between the two methods. This finding is inconsistent with previous reports,[28, 29] and needs further validation. We also noticed that the minimal depths of trees in our best-fit RF models were usually 1 to 3.…”
Section: Discussioncontrasting
confidence: 99%
“…[28] This approach was also used in machine learning models of cancer driver genes. [29] For one-hot encoding, all multicategory variables (i.e. discrete variables with more than two catrgories) were transformed into a new set of binary variables.…”
Section: Methodsmentioning
confidence: 99%
“…This success notwithstanding, the design of new MSM algorithms in coupling different scales, data utilization, and their implementation on HPC is becoming increasingly cumbersome in the face of heterogeneous data availability and rapidly evolving HPC architectures and platforms. On the other hand, while purely datadriven models of molecular and cellular systems spawned by the techniques of data science [132][133][134], and in particular, ML methods including deep learning methods [135][136][137], are easy to train and implement, the underlying model manifests as a black-box. This general approach taken by the ML community is well suited for classification, learning, and regression problems, but suffers from limitations in interpretability and explainability, especially when mechanism-based understanding is a primary goal.…”
Section: Integrating Msm and ML To Elucidate The Emergence Of Functiomentioning
confidence: 99%
“…In addition, the prediction results of the current model are often difficult to develop the drug discovery for clinical trials (Gayvert et al, 2016;Neves et al, 2018;Vamathevan et al, 2019). A deep learning (DL) approach with convolutional neural networks (CNNs), Rectified Linear Unit (ReLU), and max pooling is a promising, powerful tool for the classification modeling (Date and Kikuchi, 2018;Öztürk et al, 2018;Wang et al, 2018;Agajanian et al, 2019;Idakwo et al, 2019;Jo et al, 2019), where factors affecting its prediction performance include sufficient size, suitable representation, and accurate labeling of supervised input datasets (Bello et al, 2019;Chauhan et al, 2019;Liu P. et al, 2019). To resolve these issues, the DL-based QSAR modeling approach using molecular images produced by 3D chemical structure as input data was previously developed and referred to as the DeepSnap-DL approach (Uesawa, 2018).…”
Section: Introductionmentioning
confidence: 99%