The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology

Richard, Ann M.; Huang, Ruili; Waidyanatha, Suramya; Shinn, Paul; Collins, Bradley J.; Thillainadarajah, Inthirany; Grulke, Christopher M.; Williams, Antony J.; Lougee, Ryan; Judson, Richard S.; Houck, Keith A.; Shobair, Mahmoud; Yang, Chihae; Rathman, James F.; Yasgar, Adam; Fitzpatrick, Suzanne; Simeonov, Anton; Thomas, Russell S.; Crofton, Kevin M.; Paules, Richard S.; Bucher, John R.; Austin, Christopher P.; Kavlock, Robert J.; Tice, Raymond R.

doi:10.1021/acs.chemrestox.0c00264

Cited by 190 publications

(161 citation statements)

References 53 publications

Supporting

Mentioning

156

Contrasting

Unclassified

Order By: Relevance

“…With more high-quality standardised data available, the (potential) impact of ML methods in regulatory toxicology is growing [4]. The collection of available toxicity data is increasing, thanks in part to high-throughput screening programs such as ToxCast [5] and Tox21 [6,7], but also with public-private partnerships such as the eTOX and eTRANSAFE projects, which focus on the sharing of (confidential) toxicity data and ML models across companies [8,9]. In any case, no matter which underlying data and ML method is used, it is essential to know or assess if the ML model can be reliably used to make predictions on a new dataset.…”

Section: Introductionmentioning

confidence: 99%

Assessing the calibration in toxicological in vitro models with conformal prediction

et al. 2021

View full text Add to dashboard Cite

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.

show abstract

Section: Introductionmentioning

confidence: 99%

Assessing the calibration in toxicological in vitro models with conformal prediction

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The interaction between endocrine-disrupting chemicals and nuclear–receptor family proteins can affect the endocrine system in the AOP [ 4 ]. The Toxicology in the 21st Century (Tox21) program is a US federal research collaboration between the US Environmental Protection Agency, the National Toxicology Program, the National Center for Advancing Translational Sciences, and the Food and Drug Administration that aims to develop toxicity assessment methods for commercial chemicals, pesticides, food additives/contaminants, and medical products using quantitative high-throughput screening [ 5 , 6 , 7 ]. The Tox21 10K library consists of approximately 10,000 (10K) chemicals, including nearly 100 million data points obtained by in vitro quantitative high-throughput screening, indicating the toxicological risk of chemical compounds obtained using an in silico approach [ 8 , 9 ].…”

Section: Introductionmentioning

confidence: 99%

Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure–Activity Relationship System

Matsuzaka

Totoki

Handa

et al. 2021

IJMS

View full text Add to dashboard Cite

In silico approaches have been studied intensively to assess the toxicological risk of various chemical compounds as alternatives to traditional in vivo animal tests. Among these approaches, quantitative structure–activity relationship (QSAR) analysis has the advantages that it is able to construct models to predict the biological properties of chemicals based on structural information. Previously, we reported a deep learning (DL) algorithm-based QSAR approach called DeepSnap-DL for high-performance prediction modeling of the agonist and antagonist activity of key molecules in molecular initiating events in toxicological pathways using optimized hyperparameters. In the present study, to achieve high throughput in the DeepSnap-DL system–which consists of the preparation of three-dimensional molecular structures of chemical compounds, the generation of snapshot images from the three-dimensional chemical structures, DL, and statistical calculations—we propose an improved DeepSnap-DL approach. Using this improved system, we constructed 59 prediction models for the agonist and antagonist activity of key molecules in the Tox21 10K library. The results indicate that modeling of the agonist and antagonist activity with high prediction performance and high throughput can be achieved by optimizing suitable parameters in the improved DeepSnap-DL system.

show abstract

“…The data must be well organised and structured in databases (e.g. Integrated Chemical Environment (ICE), 44 Tox21, 79 ToxCast, 80 ChEMBL, 70 PubChem 71 ) and made publicly available. 81 In addition, all of the scripts used to process or model the data should be available.…”

Section: Integrative Knowledge-driven Experimental Design For Reducing Animal Testingmentioning

confidence: 99%

Curated Data In — Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

et al. 2021

View full text Add to dashboard Cite

New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7–24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.

show abstract

The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology

Cited by 190 publications

References 53 publications

Assessing the calibration in toxicological in vitro models with conformal prediction

Assessing the calibration in toxicological in vitro models with conformal prediction

Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure–Activity Relationship System

Curated Data In — Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

Contact Info

Product

Resources

About