Partial atomic charge assignment is of immense practical value to force field parametrization, molecular docking, and cheminformatics. Machine learning has emerged as a powerful tool for modeling chemistry at unprecedented computational speeds given ground-truth values, but for the task of charge assignment, the choice of ground-truth may not be obvious. In this letter, we use machine learning to discover a charge model by training a neural network to molecular dipole moments using a large, diverse set of CHNO molecular conformations. The new model, called Affordable Charge Assignment (ACA), is computationally inexpensive and predicts dipoles of out-of-sample molecules accurately. Furthermore, dipole-inferred ACA charges are transferable to dipole and even quadrupole moments of much larger molecules than those used for training. We apply ACA to long dynamical trajectories of biomolecules and successfully produce their infrared spectra. Additionally, we compare ACA with existing charge models and find that ACA assigns similar charges to Charge Model 5, but with a greatly reduced computational cost.
Graphical TOC Entry
Extensibility Tests
Molecular Size Training DatasetKeywords machine learning, neural networks, quantum chemisty 2 Electrostatic interactions contribute strongly to the forces within and between molecules.These interactions depend on the charge density field ⇢(r), which is computationally demanding to compute. Simplified models of the charge density, such as atom-centered monopoles, are commonly employed. These partial atomic charges result in faster computation as well as provide a qualitative understanding of the underlying chemistry. [1][2][3][4] However, the decomposition of charge density into atomic charges is, by itself, an ambiguous task. Additional principles are necessary to make the charge assignment task well-defined. Here we show that a Machine Learning model, trained only on the dipole moments of small molecules, discovers a charge model that is transferable to quadrupole predictions and extensible to much larger molecules.Existing popular charge models have also been designed to reproduce observables of the electrostatic potential. The Merz-Singh-Kollman (MSK) [5,6] charge model exactly replicates the dipole moment and approximates the electrostatic potential on many points surrounding the molecule, resulting in high-quality electrostatic properties exterior to the molecule. However, MSK suffers from basis set sensitivity, particularly for "buried atoms" located inside large molecules. [7][8][9] Charge model 5 (CM5) [8] is an extension of Hirshfeld analysis, [10] with additional parametrization in order to approximately reproduce ab initio and experimental dipoles of 614 gas-phase dipoles. Unlike MSK, Hirshfeld and CM5 are nearly independent of basis set. [9] This insensitivity allows CM5 to use a single set of model parameters. The corresponding tradeoff is that its charges do not reproduce electrostatic fields as well as
MSK.A limitation of these conventional charge models is...