Virtual compound libraries, descriptions of all of the structures that might be produced by specified transformations involving specified reagents, are especially useful in molecular discovery when suitably fast and relevant searching techniques are available. Issues to be considered include fundamental data structures, neighborhood searching principles, useful searching approaches and techniques, library definition and construction, algorithmic details of library comparison, and user interfaces.
Drug discovery and development is a costly and time-consuming endeavor (Calcoen et al. Nat Rev Drug Discov 14(3):161-162, 2015; The truly staggering cost of inventing new drugs. Forbes. http://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/, 2012; Scannell et al. Nat Rev Drug Discov 11(3):191-200, 2012). Over the last two decades, computational tools and in silico models to predict ADMET (Adsorption, Distribution, Metabolism, Excretion, and Toxicity) profiles of molecules have been incorporated into the drug discovery process mainly in an effort to avoid late-stage failures due to poor pharmacokinetics and toxicity. It is now widely recognized that ADMET issues should be addressed as early as possible in drug discovery. Here, we describe in detail how ADMET models can be developed and applied using a commercially available package, ADMET Predictor™ 7.2 (ADMET Predictor v7.2. Simulations Plus, Inc., Lancaster, CA, USA).
BackgroundQuantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions.ResultsSubmodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool.ConclusionsConfidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.