The median lethal dose for rodent oral acute toxicity (LD50) is a standard piece of information required to categorize chemicals in terms of the potential hazard posed to human health after acute exposure. The exclusive use of in vivo testing is limited by the time and costs required for performing experiments and by the need to sacrifice a number of animals. (Quantitative) structure–activity relationships [(Q)SAR] proved a valid alternative to reduce and assist in vivo assays for assessing acute toxicological hazard. In the framework of a new international collaborative project, the NTP Interagency Center for the Evaluation of Alternative Toxicological Methods and the U.S. Environmental Protection Agency’s National Center for Computational Toxicology compiled a large database of rat acute oral LD50 data, with the aim of supporting the development of new computational models for predicting five regulatory relevant acute toxicity endpoints. In this article, a series of regression and classification computational models were developed by employing different statistical and knowledge-based methodologies. External validation was performed to demonstrate the real-life predictability of models. Integrated modeling was then applied to improve performance of single models. Statistical results confirmed the relevance of developed models in regulatory frameworks, and confirmed the effectiveness of integrated modeling. The best integrated strategies reached RMSEs lower than 0.50 and the best classification models reached balanced accuracies over 0.70 for multi-class and over 0.80 for binary endpoints. Computed predictions will be hosted on the EPA’s Chemistry Dashboard and made freely available to the scientific community.
Background Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. Results We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. Conclusions We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters. Electronic supplementary material The online version of this article (10.1186/s13321-019-0350-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.