Compared to traditional experimental approaches, computational modeling is a promising strategy to efficiently prioritize new candidates with low cost. In this study, we developed a novel data mining and computational modeling workflow proven to be applicable by screening new analgesic opioids. To this end, a large opioid data set was used as the probe to automatically obtain bioassay data from the PubChem portal. There were 114 PubChem bioassays selected to build quantitative structure−activity relationship (QSAR) models based on the testing results across the probe compounds. The compounds tested in each bioassay were used to develop 12 models using the combination of three machine learning approaches and four types of chemical descriptors. The model performance was evaluated by the coefficient of determination (R 2 ) obtained from 5-fold cross-validation. In total, 49 models developed for 14 bioassays were selected based on the criteria and were identified to be mainly associated with binding affinities to different opioid receptors. The models for these 14 bioassays were further used to fill data gaps in the probe opioids data set and to predict general drug compounds in the DrugBank data set. This study provides a universal modeling strategy that can take advantage of large public data sets for computer-aided drug design (CADD).
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML-and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.