A minimal subnetwork is extracted from a very complex full network upon exploring the reaction pathways connecting reactants and products with minimum dissociation and formation of chemical bonds. Such a process reduces computational cost and correctly predicts the pathway for two representative reactions.
Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol −1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.
Basin-hopping sampling has been widely used for searching local minima on a potential energy surface. Reaction intermediates including reactants and products are also local minima composed of a reaction path, but their brute-force sampling is too demanding because of large degrees of freedom. We developed an efficient Monte Carlo basin-hopping method to sample reaction intermediates through the fragmentation of molecules and a postanalysis scheme using the graph theory with a matrix representation of molecular structures. The former greatly reduces the dimension of a given potential energy surface, while the latter offers not only the effective screening of resulting local minima toward desirable intermediates but also their automatic ordering along a reaction path. We combined it with the density functional tight binding method for rapid calculations and tested its performance for organic reactions.
Machine learning based on big data has emerged as a powerful solution in various chemical problems. We investigated the feasibility of machine learning models for the prediction of activation energies of gas-phase reactions. Six different models with three different types, including the artificial neural network, the support vector regression, and the tree boosting methods, were tested. We used the structural and thermodynamic properties of molecules and their differences as input features without resorting to specific reaction types so as to maintain the most general input form for broad applicability. The tree boosting method showed the best performance among others in terms of the coefficient of determination, mean absolute error, and root mean square error, the values of which were 0.89, 1.95, and 4.49 kcal mol , respectively. Computation time for the prediction of activation energies for 2541 test reactions was about one second on a single computing node without using accelerators.
The stabilities of radicals play a central role in determining the thermodynamics and kinetics of many reactions in organic chemistry. In this data descriptor, we provide consistent and validated quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules containing C, H, N and O atoms. These data consist of optimized 3D geometries, enthalpies, Gibbs free energy, vibrational frequencies, Mulliken charges and spin densities calculated at the M06-2X/def2-TZVP level of theory, which was previously found to have a favorable trade-off between experimental accuracy and computational efficiency. We expect this data to be useful in the further development of machine learning techniques to predict reaction pathways, bond strengths, and other phenomena closely related to organic radical chemistry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.