Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users.
The International Conference on Harmonization (ICH) M7 guideline allows the use of in silico approaches for predicting Ames mutagenicity for the initial assessment of impurities in pharmaceuticals. This is the first international guideline that addresses the use of quantitative structure–activity relationship (QSAR) models in lieu of actual toxicological studies for human health assessment. Therefore, QSAR models for Ames mutagenicity now require higher predictive power for identifying mutagenic chemicals. To increase the predictive power of QSAR models, larger experimental datasets from reliable sources are required. The Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan recently established a unique proprietary Ames mutagenicity database containing 12140 new chemicals that have not been previously used for developing QSAR models. The DGM/NIHS provided this Ames database to QSAR vendors to validate and improve their QSAR tools. The Ames/QSAR International Challenge Project was initiated in 2014 with 12 QSAR vendors testing 17 QSAR tools against these compounds in three phases. We now present the final results. All tools were considerably improved by participation in this project. Most tools achieved >50% sensitivity (positive prediction among all Ames positives) and predictive power (accuracy) was as high as 80%, almost equivalent to the inter-laboratory reproducibility of Ames tests. To further increase the predictive power of QSAR tools, accumulation of additional Ames test data is required as well as re-evaluation of some previous Ames test results. Indeed, some Ames-positive or Ames-negative chemicals may have previously been incorrectly classified because of methodological weakness, resulting in false-positive or false-negative predictions by QSAR tools. These incorrect data hamper prediction and are a source of noise in the development of QSAR models. It is thus essential to establish a large benchmark database consisting only of well-validated Ames test results to build more accurate QSAR models.
Computational prediction of xenobiotic metabolism can provide valuable information to guide the development of drugs, cosmetics, agrochemicals, and other chemical entities. We have previously developed FAME 2, an effective tool for predicting sites of metabolism (SoMs). In this work, we focus on the prediction of the chemical structures of metabolites, in particular metabolites of xenobiotics. To this end, we have developed a new tool, GLORY, which combines SoM prediction with FAME 2 and a new collection of rules for metabolic reactions mediated by the cytochrome P450 enzyme family. GLORY has two modes: MaxEfficiency and MaxCoverage. For MaxEfficiency mode, the use of predicted SoMs to restrict the locations in the molecule at which the reaction rules could be applied was explored. For MaxCoverage mode, the predicted SoM probabilities were instead used to develop a new scoring approach for the predicted metabolites. With this scoring approach, GLORY achieves a recall of 0.83 and can predict at least one known metabolite within the top three ranked positions for 76% of the molecules of a new, manually curated test set. GLORY is freely available as a web server at https://acm.zbh.uni-hamburg.de/glory/ , and the datasets and reaction rules are provided in the Supplementary Material .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.