Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in "one shot", vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate unbiased one-shot equilibrium samples of representative condensed matter systems and proteins. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates.1
Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time and length scales inaccessible to all-atom simulations. Parametrizing CG force fields to match all-atom simulations has mainly relied on forcematching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we present f low-matching, a new training method for CG force fields that combines the advantages of both methods by leveraging normalizing flows, a generative deep learning method. Flowmatching first trains a normalizing flow to represent the CG probability density, which is equivalent to minimizing the relative entropy without requiring iterative CG simulations. Subsequently, the flow generates samples and forces according to the learned distribution in order to train the desired CG free energy model via force-matching. Even without requiring forces from the all-atom simulations, flow-matching outperforms classical force-matching by an order of magnitude in terms of data efficiency and produces CG models that can capture the folding and unfolding transitions of small proteins.
Abstract1. In recent years, large-scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta-data, because a great proportion is based on expert-verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life data were generated for the reference library of 2,846 species of Coleoptera from 13,516 individuals.2. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, "TaxCI" was developed that (1) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference dataset), (2) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, (3) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, "TaxCI" may use DNA-based species delimitations from other approaches (e.g. mPTP) or may perform implemented threshold-based clustering.3. The data-processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92% vs. 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one.4. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt withThis is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Argumentation mining is considered as a key technology for future search engines and automated decision making. In such applications, argumentative text segments have to be mined from large and diverse document collections. However, most existing argumentation mining approaches tackle the classification of argumentativeness only for a few manually annotated documents from narrow domains and registers. This limits their practical applicability. We hence propose a distant supervision approach that acquires argumentative text segments automatically from online debate portals. Experiments across domains and registers show that training on such a corpus improves the effectiveness and robustness of mining argumentative text. We freely provide the underlying corpus for research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.