Background:Low-cost, high-throughput in vitro bioassays have potential as alternatives to animal models for toxicity testing. However, incorporating in vitro bioassays into chemical toxicity evaluations such as read-across requires significant data curation and analysis based on knowledge of relevant toxicity mechanisms, lowering the enthusiasm of using the massive amount of unstructured public data.Objective:We aimed to develop a computational method to automatically extract useful bioassay data from a public repository (i.e., PubChem) and assess its ability to predict animal toxicity using a novel bioprofile-based read-across approach.Methods:A training database containing 7,385 compounds with diverse rat acute oral toxicity data was searched against PubChem to establish in vitro bioprofiles. Using a novel subspace clustering algorithm, bioassay groups that may inform on relevant toxicity mechanisms underlying acute oral toxicity were identified. These bioassays groups were used to predict animal acute oral toxicity using read-across through a cross-validation process. Finally, an external test set of over 600 new compounds was used to validate the resulting model predictivity.Results:Several bioassay clusters showed high predictivity for acute oral toxicity (positive prediction rates range from 62–100%) through cross-validation. After incorporating individual clusters into an ensemble model, chemical toxicants in the external test set were evaluated for putative acute toxicity (positive prediction rate equal to 76%). Additionally, chemical fragment–in vitro–in vivo relationships were identified to illustrate new animal toxicity mechanisms.Conclusions:The in vitro bioassay data-driven profiling strategy developed in this study meets the urgent needs of computational toxicology in the current big data era and can be extended to develop predictive models for other complex toxicity end points. https://doi.org/10.1289/EHP3614
Summary: We have developed a public Chemical In vitro-In vivo Profiling (CIIPro) portal, which can automatically extract in vitro biological data from public resources (i.e. PubChem) for usersupplied compounds. For compounds with in vivo target activity data (e.g. animal toxicity testing results), the integrated cheminformatics algorithm will optimize the extracted biological data using in vitro-in vivo correlations. The resulting in vitro biological data for target compounds can be used for read-across risk assessment of target compounds. Additionally, the CIIPro portal can identify the most similar compounds based on their optimized bioprofiles. The CIIPro portal provides new powerful assessment capabilities to the scientific community and can be easily integrated with other cheminformatics tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.