2020
DOI: 10.1016/j.chempr.2020.05.014
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Make Chemical Predictions: The Interplay of Feature Representation, Data, and Machine Learning Methods

Abstract: Recently, supervised machine learning has been ascending in providing new predictive approaches for chemical, biological, and materials sciences applications. In this Perspective, we focus on the interplay of machine learning methods with the chemically motivated descriptors and the size and type of datasets needed for molecular property prediction. Using nuclear magnetic resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-spac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
73
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(75 citation statements)
references
References 59 publications
1
73
0
1
Order By: Relevance
“…However, long MD simulations generate a large amount of high-dimensional data, making it difficult to identify which region or residues of the RBD make significant contributions to these differences. 33,34,36 ML approaches can be excellent tools for identifying differences in MD trajectories 34,35,37,42 and have recently been applied in computational studies related to SARS-CoV-2. 35,43,44 Inspired by the work of Fleetwood et al, 34 we trained ML classifiers to distinguish between the configurations from the SARS-CoV and SARS-CoV-2 trajectories, and, in the process, rate the importance of each feature to the classification.…”
Section: Identification Of Residues Distinguishing Sars-cov From Sarsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, long MD simulations generate a large amount of high-dimensional data, making it difficult to identify which region or residues of the RBD make significant contributions to these differences. 33,34,36 ML approaches can be excellent tools for identifying differences in MD trajectories 34,35,37,42 and have recently been applied in computational studies related to SARS-CoV-2. 35,43,44 Inspired by the work of Fleetwood et al, 34 we trained ML classifiers to distinguish between the configurations from the SARS-CoV and SARS-CoV-2 trajectories, and, in the process, rate the importance of each feature to the classification.…”
Section: Identification Of Residues Distinguishing Sars-cov From Sarsmentioning
confidence: 99%
“…These simulations produce an extraordinary amount of data, which poses serious challenges for analysis and interpretation. [33][34][35][36][37][38] Similar to the recent work of Fleetwood et al, 34 we use supervised machine learning (ML) approaches to assist in identifying those residues that contribute the most to dynamical differences between the two viral RBDs. Based on this identification, we further quantify the relative changes in binding free energy for the RBD with ACE2 via free energy perturbation (FEP) calculations.…”
Section: Introductionmentioning
confidence: 99%
“…The factors that affect the performances of prediction models can be basically grouped into four categories ( Fig. 1a), where the first two pertain to the data and the latter two pertain to the model [4]:…”
Section: • Training and Testing The Modelmentioning
confidence: 99%
“…Accordingly, the importance of the interpretability for machine learning models in chemistry has been outlined before. 102 One pathway towards inherently explainable artificial intelligence could be the re-emergence of symbolic artificial intelligence. 103 In chemistry, interpretability goes hand-in-hand with the representation of a given problem.…”
Section: New Ideas and Paradigmsmentioning
confidence: 99%