In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into "yes/no" predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.
BackgroundNearly half of the world’s population (3.2 billion people) were at risk of malaria in 2015, and resistance to current therapies is a major concern. While the standard of care includes drug combinations, there is a pressing need to identify new combinations that can bypass current resistance mechanisms. In the work presented here, a combined transcriptional drug repositioning/discovery and machine learning approach is proposed.MethodsThe integrated approach utilizes gene expression data from patient-derived samples, in combination with large-scale anti-malarial combination screening data, to predict synergistic compound combinations for three Plasmodium falciparum strains (3D7, DD2 and HB3). Both single compounds and combinations predicted to be active were prospectively tested in experiment.ResultsOne of the predicted single agents, apicidin, was active with the AC50 values of 74.9, 84.1 and 74.9 nM in 3D7, DD2 and HB3 P. falciparum strains while its maximal safe plasma concentration in human is 547.6 ± 136.6 nM. Apicidin at the safe dose of 500 nM kills on average 97% of the parasite. The synergy prediction algorithm exhibited overall precision and recall of 83.5 and 65.1% for mild-to-strong, 48.8 and 75.5% for moderate-to-strong and 12.0 and 62.7% for strong synergies. Some of the prospectively predicted combinations, such as tacrolimus-hydroxyzine and raloxifene-thioridazine, exhibited significant synergy across the three P. falciparum strains included in the study.ConclusionsSystematic approaches can play an important role in accelerating discovering novel combinational therapies for malaria as it enables selecting novel synergistic compound pairs in a more informed and cost-effective manner.Electronic supplementary materialThe online version of this article (10.1186/s12936-018-2294-5) contains supplementary material, which is available to authorized users.
Small molecules are being increasingly used for inducing the targeted differentiation of stem cells to different cell types. However, until now no systematic method for selecting suitable small molecules for this purpose has been presented. In this work, we propose an integrated and general bioinformatics- and cheminformatics-based approach for selecting small molecules which direct cellular differentiation in the desired way. The approach was successfully experimentally validated for differentiating stem cells into cardiomyocytes. All predicted compounds enhanced expression of cardiac progenitor (Gata4, Nkx2-5 and Mef2c) and mature cardiac markers (Actc1, myh6) significantly during and post-cardiac progenitor formation. The best-performing compound, Famotidine, increased the percentage of Myh6-positive cells from 33 to 56%, and enhanced the expression of Nkx2.5 and Tnnt2 cardiac progenitor and cardiac markers in protein level. The approach employed in the study is applicable to all other stem cell differentiation settings where gene expression data are available.
Differentiation therapy is attracting increasing interest in cancer as it can be more specific than conventional chemotherapy approaches, and it has offered new treatment options for some cancer types, such as treating acute promyelocytic leukaemia (APL) by retinoic acid. However, there is a pressing need to identify additional molecules which act in this way, both in leukaemia and other cancer types. In this work, we hence developed a novel transcriptional drug repositioning approach, based on both bioinformatics and cheminformatics components, that enables selecting such compounds in a more informed manner. We have validated the approach for leukaemia cells, and retrospectively retinoic acid was successfully identified using our method. Prospectively, the anti-parasitic compound fenbendazole was tested in leukaemia cells, and we were able to show that it can induce the differentiation of leukaemia cells to granulocytes in low concentrations of 0.1 μM and within as short a time period as 3 days. This work hence provides a systematic and validated approach for identifying small molecules for differentiation therapy in cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.