We present a novel scheme to accurately predict atomic forces as vector quantities, rather than sets of scalar components, by Gaussian Process (GP) Regression. This is based on matrix-valued kernel functions, on which we impose the requirements that the predicted force rotates with the target configuration and is independent of any rotations applied to the configuration database entries. We show that such covariant GP kernels can be obtained by integration over the elements of the rotation group SO(d) for the relevant dimensionality d. Remarkably, in specific cases the integration can be carried out analytically and yields a conservative force field that can be recast into a pair interaction form. Finally, we show that restricting the integration to a summation over the elements of a finite point group relevant to the target system is sufficient to recover an accurate GP. The accuracy of our kernels in predicting quantum-mechanical forces in real materials is investigated by tests on pure and defective Ni, Fe and Si crystalline systems.
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction , density estimation , and clustering , and kinetic models . We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
We provide a definition and explicit expressions for n-body Gaussian Process (GP) kernels, which can learn any interatomic interaction occurring in a physical system, up to n-body contributions, for any value of n. The series is complete, as it can be shown that the "universal approximator" squared exponential kernel can be written as a sum of n-body kernels. These recipes enable the choice of optimally efficient force models for each target system, as confirmed by extensive testing on various materials. We furthermore describe how the n-body kernels can be "mapped" on equivalent representations that provide database-size-independent predictions and are thus crucially more efficient. We explicitly carry out this mapping procedure for the first non-trivial (3-body) kernel of the series, and we show that this reproduces the GP-predicted forces with meV/Å accuracy while being orders of magnitude faster. These results pave the way to using novel force models (here named "M-FFs") that are computationally as fast as their corresponding standard parametrised n-body force fields, while retaining the nonparametric character, the ease of training and validation, and the accuracy of the best recently proposed machine learning potentials.
We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analyzing the performance of 2-body, 3-body, and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within an ∼0.1 eV/Å average error even for small training datasets and achieve high accuracy even on out-of-sample, high temperature structures. While training and testing on the same structure always provide satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [A. Glielmo et al., Phys. Rev. B 95, 214302 (2017)]. We use this to assess the thermal stability of Ni nanoclusters at a fractional cost of full ab initio calculations.
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Finding a small set of features that still retains sufficient information about the dataset is important for the successful application of many statistical learning approaches. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This ranking can in turn be used to identify the most informative distance measure and, therefore, the most informative set of features, out of a pool of candidates. To illustrate the general applicability of our approach, we show that it reproduces the known importance ranking of policy variables for Covid-19 control, and also identifies compact yet informative descriptors for atomic structures. We further provide initial evidence that the information asymmetry measured by the proposed test can be used to infer relationships of causality between the features of a dataset. The method is general and should be applicable to many branches of science.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.