We provide a definition and explicit expressions for n-body Gaussian Process (GP) kernels, which can learn any interatomic interaction occurring in a physical system, up to n-body contributions, for any value of n. The series is complete, as it can be shown that the "universal approximator" squared exponential kernel can be written as a sum of n-body kernels. These recipes enable the choice of optimally efficient force models for each target system, as confirmed by extensive testing on various materials. We furthermore describe how the n-body kernels can be "mapped" on equivalent representations that provide database-size-independent predictions and are thus crucially more efficient. We explicitly carry out this mapping procedure for the first non-trivial (3-body) kernel of the series, and we show that this reproduces the GP-predicted forces with meV/Å accuracy while being orders of magnitude faster. These results pave the way to using novel force models (here named "M-FFs") that are computationally as fast as their corresponding standard parametrised n-body force fields, while retaining the nonparametric character, the ease of training and validation, and the accuracy of the best recently proposed machine learning potentials.
We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analyzing the performance of 2-body, 3-body, and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within an ∼0.1 eV/Å average error even for small training datasets and achieve high accuracy even on out-of-sample, high temperature structures. While training and testing on the same structure always provide satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [A. Glielmo et al., Phys. Rev. B 95, 214302 (2017)]. We use this to assess the thermal stability of Ni nanoclusters at a fractional cost of full ab initio calculations.
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Finding a small set of features that still retains sufficient information about the dataset is important for the successful application of many statistical learning approaches. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This ranking can in turn be used to identify the most informative distance measure and, therefore, the most informative set of features, out of a pool of candidates. To illustrate the general applicability of our approach, we show that it reproduces the known importance ranking of policy variables for Covid-19 control, and also identifies compact yet informative descriptors for atomic structures. We further provide initial evidence that the information asymmetry measured by the proposed test can be used to infer relationships of causality between the features of a dataset. The method is general and should be applicable to many branches of science.
Machine learning algorithms have recently emerged as a tool to generate force fields which display accuracies approaching the ones of the ab-initio calculations they are trained on, but are much faster to compute. The enhanced computational speed of machine learning force fields results key for modelling metallic nanoparticles, as their fluxionality and multi-funneled energy landscape needs to be sampled over long time scales. In this review, we first formally introduce the most commonly used machine learning algorithms for force field generation, briefly outlining their structure and properties. We then address the core issue of training database selection, reporting methodologies both already used and yet unused in literature. We finally report and discuss the recent literature regarding machine learning force fields to sample the energy landscape and study the catalytic activity of metallic nanoparticles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.