Conspectus
Machine
learning interatomic potentials (MLIPs) are widely used
for describing molecular energy and continue bridging the speed and
accuracy gap between quantum mechanical (QM) and classical approaches
like force fields. In this Account, we focus on the out-of-the-box
approaches to developing transferable MLIPs for diverse chemical tasks.
First, we introduce the “Accurate Neural Network engine for
Molecular Energies,” ANAKIN-ME, method (or ANI for short).
The ANI model utilizes Justin Smith Symmetry Functions (JSSFs) and
realizes training for vast data sets. The training data set of several
orders of magnitude larger than before has become the key factor of
the knowledge transferability and flexibility of MLIPs. As the quantity,
quality, and types of interactions included in the training data set
will dictate the accuracy of MLIPs, the task of proper data selection
and model training could be assisted with advanced methods like active
learning (AL), transfer learning (TL), and multitask learning (MTL).
Next, we describe the AIMNet “Atoms-in-Molecules Network”
that was inspired by the quantum theory of atoms in molecules. The
AIMNet architecture lifts multiple limitations in MLIPs. It encodes
long-range interactions and learnable representations of chemical
elements. We also discuss the AIMNet-ME model that expands the applicability
domain of AIMNet from neutral molecules toward open-shell systems.
The AIMNet-ME encompasses a dependence of the potential on molecular
charge and spin. It brings ML and physical models one step closer,
ensuring the correct molecular energy behavior over the total molecular
charge.
We finally describe perhaps the simplest possible physics-aware
model, which combines ML and the extended Hückel method. In
ML-EHM, “Hierarchically Interacting Particle Neural Network,”
HIP-NN generates the set of a molecule- and environment-dependent
Hamiltonian elements αμμ and K
‡. As a test example, we show how in
contrast to traditional Hückel theory, ML-EHM correctly describes
orbital crossing with bond rotations. Hence it learns the underlying
physics, highlighting that the inclusion of proper physical constraints
and symmetries could significantly improve ML model generalization.