2021
DOI: 10.1021/acs.jcim.1c00610
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Solvent-Dependent Nucleophilicity Parameter with a Causal Structure Property Relationship

Abstract: Solvent-dependent reactivity is a key aspect of synthetic science, which controls reaction selectivity. The contemporary focus on new, sustainable solvents highlights a need for reactivity predictions in different solvents. Herein, we report the excellent machine learning prediction of the nucleophilicity parameter N in the four most-common solvents for nucleophiles in the Mayr’s reactivity parameter database (R 2 = 0.93 and 81.6% of predictions within ±2.0 of the experimental values with Extra Trees algorithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(25 citation statements)
references
References 35 publications
(72 reference statements)
0
25
0
Order By: Relevance
“…To determine the optimal values of distance cutoff ( R cut ), number of interaction blocks ( T ), and feature dimensions of atom representations ( d of X t , Figure ) and the solvent embedding ( d sol ), Z-SchNet-CDFT with different hyperparameter combinations was built and evaluated using a fixed training:test split (90:10). The training:test split was from Nguyen’s study, with proportional data from different nucleophile types. Each hyperparameter combination was tested 10 times to get average metrics.…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To determine the optimal values of distance cutoff ( R cut ), number of interaction blocks ( T ), and feature dimensions of atom representations ( d of X t , Figure ) and the solvent embedding ( d sol ), Z-SchNet-CDFT with different hyperparameter combinations was built and evaluated using a fixed training:test split (90:10). The training:test split was from Nguyen’s study, with proportional data from different nucleophile types. Each hyperparameter combination was tested 10 times to get average metrics.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…In addition to electronic descriptors, such as the highest occupied molecular orbital (HOMO) energy, some structural descriptors such as the buried volume were also calculated based on the optimized structure of the intermediate from the reaction of the nucleophile with a proton. Very recently, Nguyen and co-workers curated a more complex data set consisting of 904 nucleophiles in the Mayr reactivity database, covering a wide range of chemical space (17 types of nucleophiles) . An Extra Trees (ET)-based model was developed to predict the nucleophilicity parameter.…”
Section: Introductionmentioning
confidence: 99%
“…19 For initial model screening the dataset was split randomly into 90% training and 10% test sets as described in the previous approaches. 26 Since overfitting and underfitting problems are inherent to advanced ML models, several remedies have been devised to overcome these issues. k -Fold cross validation is one such method where training data is randomly split into k number of folds, followed by training the model on k − 1 folds and validation on the remaining fold.…”
Section: Resultsmentioning
confidence: 99%
“…Several quantitative structure activity relationship studies and ML methods have proven quantum chemical and thermodynamic descriptors to be important tools for predicting various chemical and physical properties of molecules. [23][24][25][26]30 For our predictive modelling application, density functional theory (DFT) based computational approaches were used as they enable the generation of various electronic and thermodynamic descriptors in the most efficient and reliable manner. 31,32 Since the solvents employed in our study belong to 13 diverse categories, choosing appropriate descriptors that accurately represents the wholesome properties of these molecules was a big challenge.…”
Section: Dataset and Descriptorsmentioning
confidence: 99%
See 1 more Smart Citation