2012
DOI: 10.1103/physrevlett.108.058301
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Abstract: We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schrödinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute er… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

7
1,867
1
5

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 2,024 publications
(1,963 citation statements)
references
References 37 publications
7
1,867
1
5
Order By: Relevance
“…2. (a) BAML and polarizability representation based ML learning curves for 9 molecular properties of 6k constitutional isomers of formula C 7 H 10 O 2 . 31 Results for Coulomb matrix (CM) 7 and bag-of-bonds (BoB) 23 are shown for comparison. (b) BAML learning curves for 134k QM9 molecules for the same 9 molecular properties.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…2. (a) BAML and polarizability representation based ML learning curves for 9 molecular properties of 6k constitutional isomers of formula C 7 H 10 O 2 . 31 Results for Coulomb matrix (CM) 7 and bag-of-bonds (BoB) 23 are shown for comparison. (b) BAML learning curves for 134k QM9 molecules for the same 9 molecular properties.…”
mentioning
confidence: 99%
“…1 Alternatively, Kernel-Ridge-Regression (KRR) based machine learning (ML) models 2 can also infer the observable in terms of a linear expansion in chemical compound space. [3][4][5][6] More specifically, any observable can be estimated using O inf (M) =  N i α i k(d(M, M i )), where k is the kernel function (e.g., Laplacian with training set dependent width), M is the molecular representation (typically in matrix or vector format), 7,8 and d is a metric (often the L 1 -norm). The sum runs over all reference molecules i used for training to obtain regression weights {α i }.…”
mentioning
confidence: 99%
“…In many systems, pure computing power and algorithms are insufficient to obtain results within a reasonable time frame. At the same time, full mathematical models involving the full complexity of the system at hand are also computationally intractable, for example in quantum physics [65][66][67][68][69]. Building such hybrid strategies, we expect, will continue to be exciting research directions, at the interface between Statistics, Computer Science and application domains, see, e.g.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…[11] Trained to reference datasets, ML models can predict energies, forces, and other molecular properties. [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] They have been 3 used to discover materials [28][29][30][31][32][33][34][35][36][37] and study dynamical processes such as charge and exciton transfer. [38][39][40][41] Most related to this work are ML models of existing charge models, [9,[42][43][44] which are orders of magnitude faster than ab initio calculation.…”
Section: Mskmentioning
confidence: 99%