Polarizable continuum models provide an effective electrostatic embedding model for fragment‐based chemical shift prediction in challenging systems

Self Cite

First-principles prediction of nuclear magnetic resonance chemical shifts plays an increasingly important role in the interpretation of experimental spectra, but the required density functional theory (DFT) calculations can be computationally expensive. Promising machine learning models for predicting chemical shieldings in general organic molecules have been developed previously, though the accuracy of those models remains below that of DFT. The present study demonstrates how much higher accuracy chemical shieldings can be obtained via the Δ-machine learning approach, with the result that the errors introduced by the machine learning model are only one-half to one-third the errors expected for DFT chemical shifts relative to experiment. Specifically, an ensemble of neural networks is trained to correct PBE0/6-31G chemical shieldings up to the target level of PBE0/6-311+G(2d,p). It can predict 1H, 13C, 15N, and 17O chemical shieldings with root-mean-square errors of 0.11, 0.70, 1.69, and 2.47 ppm, respectively. At the same time, the Δ-machine learning approach is 1–2 orders of magnitude faster than the target large-basis calculations. It is also demonstrated that the machine learning model predicts experimental solution-phase NMR chemical shifts in drug molecules with only modestly worse accuracy than the target DFT model. Finally, the ability to estimate the uncertainty in the predicted shieldings based on variations within the ensemble of neural network models is also assessed.

Section: Results and Discussionmentioning

confidence: 88%

Section: Resultsmentioning

confidence: 97%

Predicting Density Functional Theory-Quality Nuclear Magnetic Resonance Chemical Shifts via Δ-Machine Learning

Unzueta

Greenwell

Beran

2021

Self Cite

“…Several fragmentation methods employing a QM/molecular mechanics (MM) framework, , and a range of density functional theory (DFT)-based methods like adjustable density matrix assembler (ADMA), fragment molecular orbital (FMO) method, combined fragmentation method (CFM), generalized energy-based fragmentation (GEBF), and systematic molecular fragmentation analysis (SMFA) have been developed by different groups to compute the NMR chemical shifts of various macromolecular systems. − Almost all of these fragmentation methods are tested and benchmarked on either proteins, peptides, or molecular crystals. Only a few studies are on nucleic acids such as a recent study using electrostatically embedded generalized molecular fractionation with conjugate caps (EE-GMFCC) scheme for the excited-state properties of fluorophore RNA systems.…”

Section: Introductionmentioning

confidence: 99%

Accurate and Cost-Effective NMR Chemical Shift Predictions for Nucleic Acids Using a Molecules-in-Molecules Fragmentation-Based Method

Chandy

Raghavachari

2023

We have developed, implemented, and assessed an efficient protocol for the prediction of NMR chemical shifts of large nucleic acids using our molecules-in-molecules (MIM) fragment-based quantum chemical approach. To assess the performance of our approach, MIM-NMR calculations are calibrated on a test set of three nucleic acids, where the structure is derived from solution-phase NMR studies. For DNA systems with multiple conformers, the one-layer MIM method with trimer fragments (MIM1trimer) is benchmarked to get the lowest energy structure, with an average error of only 0.80 kcal/mol with respect to unfragmented full molecule calculations. The MIMI-NMRdimer calibration with respect to unfragmented full molecule calculations shows a mean absolute deviation (MAD) of 0.06 and 0.11 ppm, respectively, for 1H and 13C nuclei, but the performance with respect to experimental NMR chemical shifts is comparable to the more expensive MIM1-NMR and MIM2-NMR methods with trimer subsystems. To compare with the experimental chemical shifts, a standard protocol is derived using DNA systems with Protein Data Bank (PDB) IDs 1SY8, 1K2K, and 1KR8. The effect of structural minimizations is employed using a hybrid mechanics/semiempirical approach and used for computations in solution with implicit and explicit–implicit solvation models in our MIM1-NMRdimer methodology. To demonstrate the applicability of our protocol, we tested it on seven nucleic acids, including structures with nonstandard residues, heteroatom substitutions (F and B atoms), and side chain mutations with a size ranging from ∼300 to 1100 atoms. The major improvement for predicted MIM1-NMRdimer calculations is obtained from structural minimizations and implicit solvation effects. A significant improvement with the explicit–implicit solvation model is observed only for two smaller nucleic acid systems (1KR8 and 7NBK), where the expensive first solvation shell is replaced by the microsolvation model, in which a single water molecule is added for each solvent-exposed amino and imino protons, along with the implicit solvation. Overall, our target accuracy of ∼0.2–0.3 ppm for 1H and ∼2–3 ppm for 13C has been achieved for large nucleic acids. The proposed MIM-NMR approach is accurate and cost-effective (linear scaling with system size), and it can aid in the structural assignments of a wide range of complex biomolecules.

“…We note that the use of PCM-embedded calculations in combination with a many-body expansion technique has been suggested as a practical and accurate approach for modeling chemical shieldings for more challenging systems such as proteins and molecular crystals. 20 The most well-known explicit model for large environments is probably the hybrid between QM and classical molecular mechanics (MM) force fields. 21 The common implementations of QM/MM methods describe all electrostatic interactions between QM and MM regions through simple point charges.…”

Section: Introductionmentioning

confidence: 99%

“…In cases where homogeneous and isotropic polarization effects represent the main constituents for the solvation, these effects may be efficiently accounted for by dielectric continuum models such as the polarizable continuum model (PCM). , However, continuum models neglect the discrete nature of the solvent molecules, making such models inaccurate when directional solvent–solute interactions are important. , An explicit description of the solvent is needed to account for these interactions. , Furthermore, explicit solvation models are more easily generalized to heterogeneous environments. We note that the use of PCM-embedded calculations in combination with a many-body expansion technique has been suggested as a practical and accurate approach for modeling chemical shieldings for more challenging systems such as proteins and molecular crystals …”

Section: Introductionmentioning

confidence: 99%

Nuclear Magnetic Shielding Constants with the Polarizable Density Embedding Model

Jørgensen

Reinholdt

Hedegård

et al. 2022

We extend the polarizable density embedding (PDE) model to support the calculation of nuclear magnetic resonance (NMR) shielding constants using gauge-including atomic orbitals (GIAOs) within a density functional theory (DFT) framework. The PDE model divides the total system into fragments, describing some by quantum mechanics (QM) and the others through an embedding model. The PDE model uses anisotropic polarizabilities, inter-fragment two-electron Coulomb integrals, and a non-local repulsion operator to emulate the QM effects. The terms involving Coulomb integrals are straightforwardly extended with GIAOs. In contrast, we consider two approaches to handle the gauge dependency of the non-local operator, employing either simple symmetrization or a gauge transformation. We find the latter approach to be most stable with respect to increasing the basis set size of the QM region. We examine the accuracy of the PDE model for calculating NMR shielding constants on several solutes in a water solution. The performance is compared with the classical polarizable embedding (PE) model in addition to supermolecular reference calculations. Based on these systems, we address the basis set convergence characteristics and the QM region size requirements. Furthermore, we investigate the performance of the PDE model for a system with significant electron spill-out. In many cases, we find that the PDE model outperforms the PE model, especially regarding the accuracy of nuclear shielding constants when using small QM region sizes and in systems with significant electron spill-out.