The majority of computational methods for predicting toxicity of chemicals are typically based on "nonmechanistic" cheminformatics solutions, relying on an arsenal of QSAR descriptors, often vaguely associated with chemical structures, and typically employing "black-box" mathematical algorithms. Nonetheless, such machine learning models, while having lower generalization capacity and interpretability, typically achieve a very high accuracy in predicting various toxicity endpoints, as unambiguously reflected by the results of the recent Tox21 competition. In the current study, we capitalize on the power of modern AI to predict Tox21 benchmark data using merely simple 2D drawings of chemicals, without employing any chemical descriptors. In particular, we have processed rather trivial 2D sketches of molecules with a supervised 2D convolutional neural network (2DConvNet) and demonstrated that the modern image recognition technology results in prediction accuracies comparable to the state-of-the-art cheminformatics tools. Furthermore, the performance of the image-based 2DConvNet model was comparatively evaluated on an external set of compounds from the Prestwick chemical library and resulted in experimental identification of significant and previously unreported antiandrogen potentials for several well-established generic drugs.
Motivation Recent advances in the areas of bioinformatics and chemogenomics are poised to accelerate the discovery of small-molecule regulators of cell development. Combining large genomics and molecular data sources with powerful deep learning techniques has the potential to revolutionize predictive biology. In this study, we present Deep Compound Profiler (DeepCOP), a deep learning based model that can predict gene regulating effects of low-molecular weight compounds. This model can be used for direct identification of a drug candidate causing a desired gene expression response, without utilizing any information on its interactions with protein target(s). Results In this study we successfully combined molecular fingerprint descriptors and gene descriptors (derived from GO terms) to train deep neural networks that predict differential gene regulation endpoints collected in LINCS database. We achieved 10-fold cross validation RAUC scores of and above 0.80, as well as enrichment factors of > 5. We validated our models using an external RNA-Seq dataset generated in-house that described the effect of three potent antiandrogens (with different modes of action) on gene expression in LNCaP prostate cancer cell line. The results of this pilot study demonstrate that deep learning models can effectively synergize molecular and genomic descriptors and can be used to screen for novel drug candidates with the desired effect on gene expression. We anticipate that such models can find a broad use in developing novel cancer therapeutics and can facilitate precision oncology efforts. Supplementary information Supplementary data are available at Bioinformatics online.
In recent years, the field of quantitative structure−activity/ property relationship (QSAR/QSPR) modeling has developed into a stable technology capable of reliably predicting new bioactive molecules. With the availability of inexpensive commercial sources of both synthetic chemicals and bioactivity assays, a cheminformatics-savvy scientist can readily establish a virtual drug discovery enterprise. A skilled computational chemist can not only develop a computer-aided drug discovery pipeline but also acquire or have the drug candidates made inexpensively for economical screening of desired ontarget activity, critical off-target effects, and essential drug-likeness properties. As part of our drug discovery pipeline, a novel machine-learning model was built to relate chemical structures of synthetically accessible molecules to their prices. The model was trained from our "in stock" and "made on demand" diverse chemical entities, ranging
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.