Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for “dynamic” reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.
Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.
Hydrogen bonding is an interaction of great importance in drug discovery and development as it may significantly affect chemical and biological processes including the interaction of small molecules with other molecules, proteins, and membranes. In particular, hydrogen bonding can impact drug-like properties such as target affinity and oral availability which are critical to developing effective pharmaceuticals, and therefore, numerous methods for the calculation of properties such as hydrogen-bond strengths, free energy of hydration, or water solubility have been proposed over time. However, the accessibility to efficient methods for the predictions of such properties is still limited. Here, we present the development of Jazzy, an open-source tool for the prediction of hydrogen-bond strengths and free energies of hydration of small molecules. Jazzy also allows the visualisation of hydrogen-bond strengths with atomistic resolution to support the design of compounds with desired properties and the interpretation of existing data. The tool is described in its implementation, parameter fitting, and validation against two data sets of experimental hydration free energies. Jazzy is also applied against two chemical series of bioactive compounds to show that hydrogen-bond strengths can be used to understand their structure–activity relationships. Results from the validations highlight the strengths and limitations of Jazzy, and suggest its suitability for interactive design, screening, and machine-learning featurisation.
Reaction-based de novo design refers to the generation of synthetically accessible molecules using transformation rules extracted from known reactions in the literature. In this context, we have previously described the extraction of reaction vectors from a reactions database and their coupling with a structure generation algorithm for the generation of novel molecules from a starting material. An issue when designing molecules from a starting material is the combinatorial explosion of possible product molecules that can be generated, especially for multistep syntheses. Here, we present the development of RENATE, a reaction-based de novo design tool, which is based on a pseudo-retrosynthetic fragmentation of a reference ligand and an inside-out approach to de novo design. The reference ligand is fragmented; each fragment is used to search for similar fragments as building blocks; the building blocks are combined into products using reaction vectors; and a synthetic route is suggested for each product molecule. The RENATE methodology is presented followed by a retrospective validation to recreate a set of approved drugs. Results show that RENATE can generate very similar or even identical structures to the corresponding input drugs, hence validating the fragmentation, search, and design heuristics implemented in the tool.
The isoelectric point (pI) is a fundamental physicochemical property of peptides and proteins. It is widely used to steer design away from low solubility and aggregation and guide peptide separation and purification. Experimental measurements of pI can be replaced by calculations knowing the ionizable groups of peptides and their corresponding pK a values. Different pK a sets are published in the literature for natural amino acids, however, they are insufficient to describe synthetically modified peptides, complex peptides of natural origin, and peptides conjugated with structures of other modalities. Noncanonical modifications (nCAAs) are ignored in the conventional sequence-based pI calculations, therefore producing large errors in their pI predictions. In this work, we describe a pI calculation method that uses the chemical structure as an input, automatically identifies ionizable groups of nCAAs and other fragments, and performs pK a predictions for them. The method is validated on a curated set of experimental measures on 29 modified and 119093 natural peptides, providing an improvement of R 2 from 0.74 to 0.95 and 0.96 against the conventional sequence-based approach for modified peptides for the two studied pK a prediction tools, ACDlabs and pKaMatcher, correspondingly. The method is available in the form of an open source Python library at https://github.com/AstraZeneca/peptidetools, which can be integrated into other proprietary and free software packages. We anticipate that the pI calculation tool may facilitate optimization and purification activities across various application domains of peptides, including the development of biopharmaceuticals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.