Accurate determination of solvation free energies of neutral organic compounds from first principles

Pereyaslavets, Leonid B.; Kamath, Ganesh; Butin, Oleg; Illarionov, Alexey; Olevanov, Michael; Kurnikov, Igor V.; Sakipov, Serzhan; Leontyev, Igor; Воронина, Е. Н.; Gannon, Tyler; Nawrocki, Grzegorz; Darkhovskiy, Mikhail; Ivahnenko, Ilya; Kostikov, Alexander; Scaranto, Jessica; Kurnikova, Maria; Banik, Suvo; Chan, Henry; Sternberg, Michael; Sankaranarayanan, Subramanian K. R. S.; Crawford, Brad; Potoff, Jeffrey J.; Levitt, Michael; Kornberg, Roger D.; Fain, Boris

doi:10.1038/s41467-022-28041-0

Cited by 28 publications

(84 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GNN predictions vs experimental values for (a) water and (b) cyclohexane solvation free energies on a data set of small organic molecules from ref . GNN models were trained on solvent holdout data sets with 10% of available water or cyclohexane solvation free energy data added (see the main text).…”

Section: Resultsmentioning

confidence: 99%

“…GNN models were trained on solvent holdout data sets with 10% of available water or cyclohexane solvation free energy data added (see the main text). ARROW-FF calculated solvation free energies are shown as crosses. The gray shaded area indicates ±0.5 kcal mol –1 within the experiment.…”

Section: Resultsmentioning

confidence: 99%

“…The solvation Gibbs free energy (Δ G solv ) of a molecule is an essential physicochemical property which governs its behavior in solution. Accurate prediction of a molecule’s Δ G solv value has far-reaching applications in fields ranging from organic synthesis and battery technologies to biological processes. − Molecular simulations for predicting solvation Gibbs free energy via molecular dynamics (MD) or quantum mechanical (QM) calculations, though accurate, can be a time-consuming process due to the need to parameterize new molecules and adequately sample competing solute–solvent interactions and/or entropic effects. − There is much work being conducted in this area to improve the efficiency of these calculations. , Nonetheless, as the chemical space of promising solvent–solute combinations increases with a growing number of desired applications, more computationally efficient approaches such as quantitative–structure activity relationship (QSAR) models and machine learning (ML) have been investigated for rapid screening of databases. , …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition

Low

Coote

Izgorodina

2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

The prediction of a molecule’s solvation Gibbs free (ΔG solv) energy in a given solvent is an important task which has traditionally been carried out via quantum chemical continuum methods or force field-based molecular simulations. Machine learning (ML) and graph neural networks in particular have emerged as powerful techniques for elucidating structure–property relationships. This work presents a graph neural network (GNN) for the prediction of ΔG solv which, in addition to encoding typical atom and bond-level features, incorporates chemically intuitive, solvation-relevant parameters into the featurization process: semiempirical partial atomic charges and solvent dielectric constant. Solute–solvent interactions are included via an interaction map layer which can be visualized to examine solubility-enhancing or -decreasing interactions learnt by the model. On a test set of small organic molecules, our GNN predicts ΔG solv in water and cyclohexane with an accuracy comparable to polarizable and ab initio generated force field methods [mean absolute error (MAE) = 0.4 and 0.2 kcal mol–1, respectively], without the need for any molecular simulation. For the FreeSolv data set of hydration free energies, the test MAE is 0.7 kcal mol–1. Interpretability and applicability of the model is highlighted through several examples including rationalizing the increased solubility of modified diaminoanthraquinones in organic solvents. The clear explanations afforded by our GNN allow for easy understanding of the model’s predictions, giving the experimental chemist confidence in employing ML models toward more optimized synthetic routes.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition

Low

Coote

Izgorodina

2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…1. Solvation model based on density 52 (SMD) at M06-2X 78 /Def2-TZVPP 67-70 (timing for SMD implemented in Gaussian 79 not to be published) To complete the picture, we also included literature values for FreeSolv concerning the methods ARROW-PIMD8 26 , Thermodynamic integration (TI) with GAFF2 5,6 extracted from the FreeSolv 46 database and reference interaction site model 58,90-92 (3D-RISM).…”

Section: Figmentioning

confidence: 99%

“…Ab initio molecular dynamics (AIMD) simulations not only allow studying molecules but also chemical reactions 2,[23][24][25] . However, they are much more costly than force fields 1,5,6,26,27 due to having to solve approximate quantum mechanical equations at every time step. To this account hybrid set-ups E MD −−→ R ML − − → A using both atomistic simulation and machine learning (ML) have been introduced uniting quantum mechanical equations with surrogate learning on the fly potentials [28][29][30] .…”

Section: Introductionmentioning

confidence: 99%

Ab initio machine learning of phase space averages

Weinreich,

Lemm,

von Rudorff

et al. 2022

Preprint

View full text Add to dashboard Cite

Equilibrium structures determine material properties and biochemical functions. We propose to machine learn phase-space averages, conventionally obtained by ab initio or force-field based molecular dynamics (MD) or Monte Carlo simulations. In analogy to ab initio molecular dynamics (AIMD), our ab initio machine learning (AIML) model does not require bond topologies and therefore enables a general machine learning pathway to ensemble properties throughout chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data, and reaching competitive prediction errors (MAE ∼ 0.8 kcal/mol) for out-of-sample molecules-within milli-seconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns throughout CCS at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.

show abstract