Toxicology studies are subject to several concerns and raise the importance of an early detection of chemical compounds' toxicity potential. Here we formalise the problem of bioactivity prediction (measured through in vitro assays) of chemical compounds based on their physico-chemical structure. We suggest that such prediction may be automated by using machine learning (ML) techniques based on existing data that incorporate in vitro assays results for hundreds of chemical compounds. We test this suggestion using different ML techniques and compare the results obtained on a restricted dataset and a global one. Since the pre-existing empirical data available is unbalanced, we suggest that data augmentation techniques may be used to improve the classification accuracy of the techniques, and present numerical comparisons regarding the improvements that are obtained in this manner.
The development of in silico tools able to predict bioactivity and toxicity of chemical substances is a powerful solution envisioned to assess toxicity as early as possible. To enable the development of such tools, the ToxCast program has generated and made publicly available in vitro bioactivity data for thousands of compounds. The goal of the present study is to characterize and explore the data from ToxCast in terms of Machine Learning capability. For this, a large scale analysis on the entire database has been performed to build models to predict bioactivities measured in in vitro assays. Simple classical QSAR algorithms (ANN, SVM, LDA, random forest, and Bayesian) were first applied on the data, and the results of these algorithms suggested that they do not seem to be well-suited for data sets with a high proportion of inactive compounds. The study then showed for the first time that the use of an ensemble method named "Stacked generalization" could improve the model performance on this type of data. Indeed, for 61% of 483 models, the Stacked method led to models with higher performance. Moreover, the combination of this ensemble method with an applicability domain filter allows one to assess the reliability of the predictions for further compound prioritization. In particular we showed that for 50% of the models, the ROC score is better if we do not consider the compounds that are not within the applicability domain.
G-Networks and their simplified version known as the Random Neural Network have often been used to classify data. In this paper, we present a use of the Random Neural Network to the early detection of potential of toxicity chemical compounds through the prediction of their bioactivity from the compounds’ physico-chemical structure, and propose that it be automated using machine learning (ML) techniques. Specifically the Random Neural Network is shown to be an effective analytical tool to this effect, and the approach is illustrated and compared with several ML techniques.
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions, targeting evolutionary symbolic regression. The zoetropic representation uses repeated fusion operations between partial expressions, starting from the terminal set. Repeated fusions within an individual gradually generate more complex expressions, ending up in what can be viewed as new features. These features are then linearly combined to best fit the training data. ZGP individuals then undergo specific crossover and mutation operators, and selection takes place between parents and offspring. ZGP is validated using a large number of public domain regression datasets, and compared to other symbolic regression algorithms, as well as to traditional machine learning algorithms. ZGP reaches state-of-the-art performance with respect to both types of algorithms, and demonstrates a low computational time compared to other symbolic regression approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.