Machine learning has proven to be a valuable tool to approximate functions in high-dimensional spaces. Unfortunately, analysis of these models to extract the relevant physics is never as easy as applying machine learning to a large data set in the first place. Here we present a description of atomic systems that generates machine learning representations with a direct path to physical interpretation. As an example, we demonstrate its usefulness as a universal descriptor of grain boundary systems. Grain boundaries in crystalline materials are a quintessential example of a complex, high-dimensional system with broad impact on many physical properties including strength, ductility, corrosion resistance, crack resistance, and conductivity. In addition to modeling such properties, the method also provides insight into the physical "building blocks" that influence them. This opens the way to discover the underlying physics behind behaviors by understanding which building blocks map to particular properties. Once the structures are understood, they can then be optimized for desirable behaviors.
Surrogate machine-learning models are transforming computational materials science by predicting properties of materials with the accuracy of ab initio methods at a fraction of the computational cost. We demonstrate surrogate models that simultaneously interpolate energies of different materials on a dataset of 10 binary alloys (AgCu, AlFe, AlMg, AlNi, AlTi, CoNi, CuFe, CuNi, FeV, NbNi) with 10 different species and all possible fcc, bcc and hcp structures up to 8 atoms in the unit cell, 15 950 structures in total. We find that the deviation of prediction errors when increasing the number of simultaneously modeled alloys is less than 1 meV/atom. Several state-of-the-art materials representations and learning algorithms were found to qualitatively agree on the prediction errors of formation enthalpy with relative errors of <2.5% for all systems.
We introduce machine-learned potentials for Ag-Pd to describe the energy of alloy configurations over a wide range of compositions. We compare two different approaches. Moment tensor potentials (MTPs) are polynomial-like functions of interatomic distances and angles. The Gaussian approximation potential (GAP) framework uses kernel regression, and we use the smooth overlap of atomic position (SOAP) representation of atomic neighborhoods that consist of a complete set of rotational and permutational invariants provided by the power spectrum of the spherical Fourier transform of the neighbor density. Both types of potentials give excellent accuracy for a wide range of compositions, competitive with the accuracy of cluster expansion, a benchmark for this system. While both models are able to describe small deformations away from the lattice positions, SOAP-GAP excels at transferability as shown by sensible transformation paths between configurations, and MTP allows, due to its lower computational cost, the calculation of compositional phase diagrams. Given the fact that both methods perform nearly as well as cluster expansion but yield off-lattice models, we expect them to open new avenues in computational materials modeling for alloys.
Cluster expansion (CE) is effective in modeling the stability of metallic alloys, but sometimes cluster expansions fail. Failures are often attributed to atomic relaxation in the DFT-calculated data, but there is no metric for quantifying the degree of relaxation. Additionally, numerical errors can also be responsible for slow CE convergence. We studied over one hundred different Hamiltonians and identified a heuristic, based on a normalized mean-squared displacement of atomic positions in a crystal, to determine if the effects of relaxation in CE data are too severe to build a reliable CE model. Using this heuristic, CE practitioners can determine a priori whether or not an alloy system can be reliably expanded in the cluster basis. We also examined the error distributions of the fitting data. We find no clear relationship between the type of error distribution and CE prediction ability, but there are clear correlations between CE formalism reliability, model complexity, and the number of significant terms in the model. Our results show that the size of the errors is much more important than their distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.