The
development of efficient models for predicting specific properties
through machine learning is of great importance for the innovation
of chemistry and material science. However, predicting global electronic
structure properties like Frontier molecular orbital highest occupied
molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO)
energy levels and their HOMO–LUMO gaps from the small-sized
molecule data to larger molecules remains a challenge. Here, we develop
a multilevel attention neural network, named DeepMoleNet, to enable
chemical interpretable insights being fused into multitask learning
through (1) weighting contributions from various atoms and (2) taking
the atom-centered symmetry functions (ACSFs) as the teacher descriptor.
The efficient prediction of 12 properties including dipole moment,
HOMO, and Gibbs free energy within chemical accuracy is achieved by
using multiple benchmarks, both at the equilibrium and nonequilibrium
geometries, including up to 110,000 records of data in QM9, 400,000
records in MD17, and 280,000 records in ANI-1ccx for random split
evaluation. The good transferability for predicting larger molecules
outside the training set is demonstrated in both equilibrium QM9 and
Alchemy data sets at the density functional theory (DFT) level. Additional
tests on nonequilibrium molecular conformations from DFT-based MD17
data set and ANI-1ccx data set with coupled cluster accuracy as well
as the public test sets of singlet fission molecules, biomolecules,
long oligomers, and protein with up to 140 atoms show reasonable predictions
for thermodynamics and electronic structure properties. The proposed
multilevel attention neural network is applicable to high-throughput
screening of numerous chemical species in both equilibrium and nonequilibrium
molecular spaces to accelerate rational designs of drug-like molecules,
material candidates, and chemical reactions.
We
have implemented the calculations of NMR parameters within the
generalized energy-based fragmentation (GEBF) method for condensed-phase
systems with periodic boundary conditions (PBC). In this PBC-GEBF
approach, NMR parameters of molecules in a unit cell are assembled
as a linear combination of the corresponding quantities from a series
of small embedded subsystems. To treat condensed-phase systems containing
large molecules, we propose a novel “fragment-based”
strategy for building subsystems, while our previously reported “molecule-based”
strategy for construction of subsystems is appropriate for periodic
systems with small molecules. The “fragment-based” strategy
in PBC-GEBF is demonstrated to be much more efficient than its “molecule-based”
counterpart to treat crystals of large molecules. With the “molecule-based”
PBC-GEBF method, we obtained consistently good NMR parameters of liquid
water with B3LYP on top of neural-network-potential-based ab initio molecular dynamics (AIMD) snapshots. With the
“fragment-based” PBC-GEBF approach, we predicted the 1H chemical shifts of a large macrocycle in solution based
on a series of classical MD snapshots. The calculated results are
in good accord with the experimental chemical shifts. Therefore, the
PBC-GEBF method is expected to be a reliable and efficient tool for
predicting NMR parameters of large complex systems in solutions.
An on-the-fly fragment-based machine learning (ML) approach was developed to construct the machine learning force field for large complex systems. In this approach, the energy, forces, and molecular properties of the target system are obtained by combining machine learning force fields of various subsystems with the generalized energy-based fragmentation (GEBF) approach. Using nonparametric Gaussian process (GP) model, all the force fields of subsystems are automatically generated online without data selection and parameter optimization. With the GEBF-ML force field constructed for a normal alkane, C60H122, long-time molecular dynamics (MD) simulations are performed on different sizes of alkanes, and the predicted energy, forces, and molecular properties (dipole moment) are favorably comparable with full quantum mechanics (QM) calculations. The predicted IR spectra also show excellent agreement with the direct ab initio MD results. Our results demonstrate that the GEBF-ML method provides an automatic and efficient way to build force fields for a broad range of complex systems such as biomolecules and supramolecular systems.
Electrochemical organic synthesis has attracted increasing attentions as a sustainable and versatile synthetic platform. Quantitative assessment of the electro‐organic reactions, including reaction thermodynamics, electro‐kinetics, and coupled chemical processes, can lead to effective analytical tool to guide their future design. Herein, we demonstrate that electrochemical parameters such as onset potential, Tafel slope, and effective voltage can be utilized as electro‐descriptors for the evaluation of reaction conditions and prediction of reactivities (yields). An “electro‐descriptor‐diagram” is generated, where reactive and non‐reactive conditions/substances show distinct boundary. Successful predictions of reaction outcomes have been demonstrated using electro‐descriptor diagram, or from machine learning algorithms with experimentally‐derived electro‐descriptors. This method represents a promising tool for data‐acquisition, reaction prediction, mechanistic investigation, and high‐throughput screening for general organic electro‐synthesis.
Intermolecular interactions in terms of molecular packing are crucial for the investigation of the absorption spectra of uracil in different environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.