2016
DOI: 10.1063/1.4964627
|View full text |Cite
|
Sign up to set email alerts
|

Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity

Abstract: The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. Based on the postulates of quantum mechanics, we introduce a hierarchy of representations which meet uniqueness and target similarity criteria. To systematically control target similarity, we rely on interatomic many body expansions, as implemented in universal force-fields, including Bonding, Angular, and higher order terms (BA). Addition of higher order contributions systemat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
342
0
4

Year Published

2017
2017
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 307 publications
(355 citation statements)
references
References 31 publications
2
342
0
4
Order By: Relevance
“…[40][41][42][43] ML provides a method to identify regularities and correlations in huge datasets, which yield new scientific insights without the need for any prior particular physical knowledge. [44,45,47] Typically, ML algorithms are trained using data obtained from ab initio calculations. [44,45,47] Typically, ML algorithms are trained using data obtained from ab initio calculations.…”
Section: Introductionmentioning
confidence: 99%
“…[40][41][42][43] ML provides a method to identify regularities and correlations in huge datasets, which yield new scientific insights without the need for any prior particular physical knowledge. [44,45,47] Typically, ML algorithms are trained using data obtained from ab initio calculations. [44,45,47] Typically, ML algorithms are trained using data obtained from ab initio calculations.…”
Section: Introductionmentioning
confidence: 99%
“…Empirically, learning curves are known to have a AN −β train dependence. 70 In the training of neural networks typically 1 < β < 2, 71 while we found values between 0.15 − 0.30 for the properties studied. Since different types of models can have different learning curves, we also looked at random forest, where we found similar plots with β ≈ 0.20.…”
Section: Residual Analysismentioning
confidence: 65%
“…Figure 4 shows the learning curves, constructed from δ RMS , for the baseline ML force fields (i.e., those of classes (I) and (II), which are based on random data sampling) created for Al, Cu, Ti, W, Si, and C using the optimized V i,α and V ′ i;α . For both fingerprints, δ RMS scales inversely with N t initially 39 but reaches a limit at N t ≳ 500. By using V i,α instead of V ′ i;α , δ RMS drops from ≃0.16 eV/Å to ≃0.09 eV/ Å in case of C. Significant improvement was also obtained for W, Ti, and Cu.…”
Section: Fingerprintingmentioning
confidence: 97%