Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
Highlights d A deep learning model is trained to predict antibiotics based on structure d Halicin is predicted as an antibacterial molecule from the Drug Repurposing Hub d Halicin shows broad-spectrum antibiotic activities in mice d More antibiotics with distinct structures are predicted from the ZINC15 database
Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.
Is content addressable in the representation that subserves performance in multiple-object-tracking (MOT) experiments? We devised an MOT variant that featured unique, nameable objects (cartoon animals) as stimuli. There were two possible response modes: standard, in which observers were asked to report the locations of all target items, and specific, in which observers had to report the location of a particular object (e.g., "Where is the zebra?"). A measure of capacity derived from accuracy allowed for comparisons of the results between conditions. We found that capacity in the specific condition (1.4 to 2.6 items across several experiments) was always reliably lower than capacity in the standard condition (2.3 to 3.4 items). Observers could locate specific objects, indicating a content-addressable representation. However, capacity differences between conditions, as well as differing responses to the experimental manipulations, suggest that there may be two separate systems involved in tracking, one carrying only positional information, and one carrying identity information as well.
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.