Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for abundant computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. To address this, we propose a new computational workflow that combines knowledge on endogenous metabolism from a genome scale metabolic network (GSMN) and in vitro transcriptomics data with the aim of better identifying the metabolic MoA (mMoA) of chemicals. Our workflow proceeds in three main steps. The first step consists of building cell condition-specific models representing the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, based on these enumerations, two conditions can be compared by extracting differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes (PHH) transcriptomic data in Open TG-GATEs. Then, we applied this new workflow to model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). Despite large disparities in transcriptomic effects for these two chemicals, i.e., two differentially expressed genes (DEGs) for amiodarone vs 5709 DEGs for valproic acid, our results well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA.
Motivation: Larger training datasets have been shown to improve the accuracy of Machine Learning (ML)-based Scoring functions (SFs) for Structure-Based Virtual Screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with at least nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. Results:We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this true in four of the six targets). A three-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.