2021
DOI: 10.1021/acs.jctc.1c00363
|View full text |Cite
|
Sign up to set email alerts
|

Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies

Abstract: An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 20 publications
(26 citation statements)
references
References 86 publications
0
26
0
Order By: Relevance
“…Tautomerization free energies have been recently studied with an ML/MM model based on ANI . In addition, Vazquez-Salazar et al have recently explored the impact of the diversity of training set on the ability to compute relative tautomer energies in a public data set, Tautobase, and we use the same data set here. Tautobase consists of 1673 tautomer pairs stored as SMIRKS strings.…”
Section: Resultsmentioning
confidence: 99%
“…Tautomerization free energies have been recently studied with an ML/MM model based on ANI . In addition, Vazquez-Salazar et al have recently explored the impact of the diversity of training set on the ability to compute relative tautomer energies in a public data set, Tautobase, and we use the same data set here. Tautobase consists of 1673 tautomer pairs stored as SMIRKS strings.…”
Section: Resultsmentioning
confidence: 99%
“…Despite not being trained on tautomers, the reported RMSE was 1.5 kcal/mol compared with the reference QM data. Similarly, Meuwly 55 and co-workers concluded that the ANI-1x model was the best ML potential among the five benchmarked models on the Tautobase dataset 56 with RMSE 2.85 kcal/mol. Here we benchmarked ANI-2xt for a Gibbs free energy calculation task with B97-3c (Fig.…”
Section: Tautomerization Energymentioning
confidence: 87%
“…Despite not being trained on tautomers, the reported RMSE was 1.5 kcal/mol compared with the reference QM data. Similarly, Meuwly 54 and co-workers concluded that the ANI-1x model was the best ML potential among the five benchmarked models on the Tautobase dataset 55 with an RMSE of 2.85 kcal/mol. Here we benchmarked ANI-2xt for a Gibbs free energy calculation task with B97-3c (Fig.…”
Section: Tautomerization Energymentioning
confidence: 88%