Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied. The representations are evaluated by monitoring the performance of linear and kernel ridge regression models on well-studied data sets of small organic molecules. One class of representations studied here counts the occurrence of bonding patterns in the molecule. These require only the connectivity of atoms in the molecule as may be obtained from a line diagram or a SMILES string. The second class utilizes the three-dimensional structure of the molecule. These include the Coulomb matrix and Bag of Bonds, which list the inter-atomic distances present in the molecule, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size. Encoded Bonds' features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules. A wide range of feature sets are constructed by selecting, at each rank, either a graph or geometry-based feature. Here, rank refers to the number of atoms involved in the feature, e.g., atom counts are rank 1, while Encoded Bonds are rank 2. For atomization energies in the QM7 data set, the best graph-based feature set gives a mean absolute error of 3.4 kcal/mol. Inclusion of 3D geometry substantially enhances the performance, with Encoded Bonds giving 2.4 kcal/mol, when used alone, and 1.19 kcal/mol, when combined with graph features.
Current neural networks for predictions of molecular properties use quantum chemistry only as a source of training data. This paper explores models that use quantum chemistry as an integral part of the prediction process. This is done by implementing selfconsistent-charge Density-Functional-Tight-Binding (DFTB) theory as a layer for use in deep learning models. The DFTB layer takes, as input, Hamiltonian matrix elements generated from earlier layers and produces, as output, electronic properties from self-consistent field solutions of the corresponding DFTB Hamiltonian. Backpropagation enables efficient training of the model to target electronic properties. Two types of input to the DFTB layer are explored, splines and feed-forward neural networks. Because overfitting can cause models trained on smaller molecules to perform poorly on larger molecules, regularizations are applied that penalize non-monotonic behavior and deviation of the Hamiltonian matrix elements from those of the published DFTB model used to initialize the model. The approach is evaluated on 15,700 hydrocarbons by comparing the root mean square error in energy and dipole moment, on test molecules with 8 heavy atoms, to the error from the initial DFTB model. When trained on molecules with up to 7 heavy atoms, the spline model reduces the test error in energy by 60% and in dipole moments by 42%. The neural network model performs somewhat better, with error reductions of 67% and 59% respectively. Training on molecules with up to 4 heavy atoms reduces performance, with both the spline and neural net models reducing the test error in energy by about 53% and in dipole by about 25%. arXiv:1808.04526v2 [physics.chem-ph]
The proportion of different causes of death (cause-specific mortality) is an important indicator of local ecology and local selective forces shaping behavioral and morphological adaptations and can easily be compared between species. These mortality causes are best measured by remotely monitoring individuals with radio transmitter tags to detect their eventual demise and conducting postmortem examinations to determine the exact cause of death. Although studies of mortality causes have been conducted for many mammal species, there has been no attempt to examine trends across species. Here, we review data from 69 North American mammal populations across 27 species of mostly large and medium sized mammal species, summarizing 2209 total mortality events, 1874 of which were known causes. Of the known causes, humans are the main cause of mortality (51.8%), followed by natural (48.5%). Among natural causes, predation (35.2% of known) was most prevalent in smaller species, especially herbivores. Anthropogenic causes were higher for legally unprotected populations, especially carnivores and larger species. Hunting (35.3% of known) was the most important source of humancaused mortality followed by vehicle collision (9.2% of known), which was positively correlated with the degree of human development of the local landscape. Protected populations had a 44% lower level of human-caused mortality, although it was still an important component of their overall mortality (34.6%). Our results show the variety and pervasiveness of anthropogenic mortality on many mammal species, suggesting that humans cause most mortalities observed in larger mammals in North America. These anthropogenic mortalities may represent strong selective forces for animal populations and offer mechanistic support for the growing body of evidence for rapid evolutionary shifts in behavior and morphology in response to human caused changes to the landscape.
We devise a novel technique to control the shape of polymer molecular weight distributions (MWDs) in atom transfer radical polymerization (ATRP). This technique makes use of recent advances in both simulation-based, modelfree reinforcement learning (RL) and the numerical simulation of ATRP. A simulation of ATRP is built that allows an RL controller to add chemical reagents throughout the course of the reaction. The RL controller incorporates fully-connected and convolutional neural network architectures and bases its decision upon the current status of the ATRP reaction. The initial, untrained, controller leads to ending MWDs with large variability, allowing the RL algorithm to explore a large search space. When trained using an actorcritic algorithm, the RL controller is able to discover and optimize control policies that lead to a variety of target MWDs. The target MWDs include Gaussians of various width, and more diverse shapes such as bimodal distributions. The learned control policies are robust and transfer to similar but not identical ATRP reaction settings, even under the presence of simulated noise. We believe this work is a proof-of-concept for employing modern artificial intelligence techniques in the synthesis of new functional polymer materials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.