Recent advances in molecular simulations have allowed scientists to investigate slower biological processes than ever before. Together with these advances came an explosion of data that has transformed a traditionally computing-bound into a data-bound problem. Here, we present HTMD, a programmable, extensible platform written in Python that aims to solve the data generation and analysis problem as well as increase reproducibility by providing a complete workspace for simulation-based discovery. So far, HTMD includes system building for CHARMM and AMBER force fields, projection methods, clustering, molecular simulation production, adaptive sampling, an Amazon cloud interface, Markov state models, and visualization. As a result, a single, short HTMD script can lead from a PDB structure to useful quantities such as relaxation time scales, equilibrium populations, metastable conformations, and kinetic rates. In this paper, we focus on the adaptive sampling and Markov state modeling features.
International audienceThe ten year old Houk–List model for rationalising the origin of stereoselectivity in the organocatalysed intermolecular aldol addition is revisited, using a variety of computational techniques that have been introduced or improved since the original study. Even for such a relatively small system, the role of dispersion interactions is shown to be crucial, along with the use of basis sets where the superposition errors are low. An NCI (non-covalent interactions) analysis of the transition states is able to identify the noncovalent interactions that influence the selectivity of the reaction, confirming the role of the electrostatic NCH d+ /O dÀ interactions. Simple visual inspection of the NCI surfaces is shown to be a useful tool for the design of alternative reactants. Alternative mechanisms, such as proton-relays involving a water molecule or the Hajos–Parrish alternative, are shown to be higher in energy and for which computed kinetic isotope effects are incompatible with experiment. The Amsterdam manifesto, which espouses the principle that scientific data should be citable, is followed here by using interactive data tables assembled via calls to the data DOI (digital-object-identifiers) for calculations held on a digital data repository and themselves assigned a DOI
Fast and accurate molecular force field (FF) parameterization is still an unsolved problem. Accurate FFs are not generally available for all molecules, like novel druglike molecules. While methods based on quantum mechanics (QM) exist to parameterize them with better accuracy, they are computationally expensive and slow, which limits applicability to a small number of molecules. Here, we present an automated FF parameterization method which can utilize either DFT calculations or approximate arXiv:1907.06952v2 [physics.chem-ph] 3 Aug 2019 QM energies produced by different neural network potentials (NNPs), to obtain improved parameters for molecules. We demonstrate that for the case of torchani-ANI-1x NNP, we can parameterize small molecules in a fraction of time compared with an equivalent parameterization using DFT QM calculations while producing more accurate parameters than FF (GAFF2). We expect our method to be of critical importance in computational structure-based drug discovery. The current version is available at PlayMolecule (www.playmolecule.org) and implemented in HTMD, allowing to parameterize molecules with different QM and NNP options.
Passive acoustic monitoring is a well-established tool for researching the occurrence, movements, and ecology of a wide variety of marine mammal species. Advances in hardware and data collection have exponentially increased the volumes of passive acoustic data collected, such that discoveries are now limited by the time required to analyze rather than collect the data. In order to address this limitation, we trained a deep convolutional neural network (CNN) to identify humpback whale song in over 187,000 h of acoustic data collected at 13 different monitoring sites in the North Pacific over a 14-year period. The model successfully detected 75 s audio segments containing humpback song with an average precision of 0.97 and average area under the receiver operating characteristic curve (AUC-ROC) of 0.992. The model output was used to analyze spatial and temporal patterns of humpback song, corroborating known seasonal patterns in the Hawaiian and Mariana Islands, including occurrence at remote monitoring sites beyond well-studied aggregations, as well as novel discovery of humpback whale song at Kingman Reef, at 5∘ North latitude. This study demonstrates the ability of a CNN trained on a small dataset to generalize well to a highly variable signal type across a diverse range of recording and noise conditions. We demonstrate the utility of active learning approaches for creating high-quality models in specialized domains where annotations are rare. These results validate the feasibility of applying deep learning models to identify highly variable signals across broad spatial and temporal scales, enabling new discoveries through combining large datasets with cutting edge tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.