In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
ADMET Models, whether in silico or in vitro, are commonly used to 'profile' molecules, to identify potential liabilities or filter out molecules expected to have undesirable properties. While useful, this is the most basic application of such models. Here, we will show how models may be used to go 'beyond profiling' to guide key decisions in drug discovery. For example, selection of chemical series to focus resources with confidence or design of improved molecules targeting structural modifications to improve key properties. To prioritise molecules and chemical series, the success criteria for properties and their relative importance to a project's objective must be defined. Data from models (experimental or predicted) may then be used to assess each molecule's balance of properties against those requirements. However, to make decisions with confidence, the uncertainties in all of the data must also be considered. In silico models encode information regarding the relationship between molecular structure and properties. This is used to predict the property value of a novel molecule. However, further interpretation can yield information on the contributions of different groups in a molecule to the property and the sensitivity of the property to structural changes. Visualising this information can guide the redesign process. In this article, we describe methods to achieve these goals and drive drug-discovery decisions and illustrate the results with practical examples.
All of the experimental compound data with which we work have significant uncertainties, due to imperfect correlations between experimental systems and the ultimate in vivo properties of compounds and the inherent variability in experimental conditions. When using these data to make decisions, it is essential that these uncertainties are taken into account to avoid making inappropriate decisions in the selection of compounds, which can lead to wasted effort and missed opportunities. In this paper we will consider approaches to rigorously account for uncertainties when selecting between compounds or assessing compounds against a property criterion; first for an individual measurement of a single property and then for multiple measurements of a property for the same compound. We will then explore how uncertainties in multiple properties can be combined when assessing compounds against a profile of criteria, a process known as multi-parameter optimisation. This guides rigorous decision-making using complex, uncertain data to focus on compounds with the best chance of success, while avoiding missed opportunities by inappropriately rejecting compounds.
In this article we describe a computational method that automatically generates chemically relevant compound ideas from an initial molecule, closely integrated with in silico models, and a probabilistic scoring algorithm to highlight the compound ideas most likely to satisfy a user-defined profile of required properties. The new compound ideas are generated using medicinal chemistry 'transformation rules' taken from examples in the literature. We demonstrate that the set of 206 transformations employed is generally applicable, produces a wide range of new compounds, and is representative of the types of modifications previously made to move from lead-like to drug-like compounds. Furthermore, we show that more than 94% of the compounds generated by transformation of typical drug-like molecules are acceptable to experienced medicinal chemists. Finally, we illustrate an application of our approach to the lead that ultimately led to the discovery of duloxetine, a marketed serotonin reuptake inhibitor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.