BackgroundIn silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds.ResultsA Bernoulli Naïve Bayes algorithm was trained using the data and evaluated using fivefold cross-validation, achieving a mean recall and precision of 67.7 and 63.8 % for active compounds and 99.6 and 99.7 % for inactive compounds, respectively. We show the performances of the models are considerably influenced by the underlying intraclass training similarity, the size of a given class of compounds, and the degree of additional oversampling. The method was also validated using compounds extracted from WOMBAT producing average precision-recall AUC and BEDROC scores of 0.56 and 0.85, respectively. Inactive data points used for this test are based on presumed inactivity, producing an approximated indication of the true extrapolative ability of the models. A distance-based applicability domain analysis was also conducted; indicating an average Tanimoto Coefficient distance of 0.3 or greater between a test and training set can be used to give a global measure of confidence in model predictions. A final comparison to a method trained solely on active data from ChEMBL performed with precision-recall AUC and BEDROC scores of 0.45 and 0.76.ConclusionsThe inclusion of inactive data for model training produces models with superior AUC and improved early recognition capabilities, although the results from internal and external validation of the models show differing performance between the breadth of models. The realised target prediction protocol is available at https://github.com/lhm30/PIDGIN.Graphical abstractThe inclusion of large scale negative training data for in silico target prediction improves the precision and recall AUC and BEDROC scores for target models.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-015-0098-y) contains supplementary material, which is available to authorized users.
In image-based profiling, software extracts thousands of morphological features of cells from multi-channel fluorescence microscopy images, yielding single-cell profiles that can be used for basic research and drug discovery. Powerful applications have been proven, including clustering chemical and genetic perturbations based on their similar morphological impact, identifying disease phenotypes by observing differences in profiles between healthy and diseased cells, and predicting assay outcomes using machine learning, among many others. Here we provide an updated protocol for the most popular assay for image-based profiling, Cell Painting. Introduced in 2013, it uses six stains imaged in five channels and labels eight diverse components of the cell: DNA, cytoplasmic RNA, nucleoli, actin, golgi apparatus, plasma membrane, endoplasmic reticulum, and mitochondria. The original protocol was updated in 2016 based on several years' experience running it at two sites, after optimizing it by visual stain quality. Here we describe the work of the Joint Undertaking for Morphological Profiling (JUMP) Cell Painting Consortium, aiming to improve upon the assay via quantitative optimization, based on the measured ability of the assay to detect morphological phenotypes and group similar perturbations together. We find that the assay gives very robust outputs despite a variety of changes to the protocol and that two vendors' dyes work equivalently well. We present Cell Painting version 3, in which some steps are simplified and several stain concentrations can be reduced, saving costs. Cell culture and image acquisition take 1 to 2 weeks for a typically sized batch of 20 or fewer plates; feature extraction and data analysis take an additional 1 to 2 weeks.
Overactivation of PI3K/Akt/mTOR is linked with carcinogenesis and serves a potential molecular therapeutic target in treatment of various cancers. Herein, we report the synthesis of trisubstituted-imidazoles and identified 2-chloro-3-(4, 5-diphenyl-1H-imidazol-2-yl) pyridine (CIP) as lead cytotoxic agent. Naïve Base classifier model of in silico target prediction revealed that CIP targets RAC-beta serine/threonine-protein kinase which comprises the Akt. Furthermore, CIP downregulated the phosphorylation of Akt, PDK and mTOR proteins and decreased expression of cyclin D1, Bcl-2, survivin, VEGF, procaspase-3 and increased cleavage of PARP. In addition, CIP significantly downregulated the CXCL12 induced motility of breast cancer cells and molecular docking calculations revealed that all compounds bind to Akt2 kinase with high docking scores compared to the library of previously reported Akt2 inhibitors. In summary, we report the synthesis and biological evaluation of imidazoles that induce apoptosis in breast cancer cells by negatively regulating PI3K/Akt/mTOR signaling pathway.
Image-based profiling has emerged as a powerful technology for various steps in basic biological and pharmaceutical discovery, but the community has lacked a large, public reference set of data from chemical and genetic perturbations. Here we present data generated by the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium, a collaboration between 10 pharmaceutical companies, six supporting technology companies, and two non-profit partners. When completed, the dataset will contain images and profiles from the Cell Painting assay for over 116,750 unique compounds, over-expression of 12,602 genes, and knockout of 7,975 genes using CRISPR-Cas9, all in human osteosarcoma cells (U2OS). The dataset is estimated to be 115 TB in size and capturing 1.6 billion cells and their single-cell profiles. File quality control and upload is underway and will be completed over the coming months at the Cell Painting Gallery: https://registry.opendata.aws/cellpainting-gallery. A portal to visualize a subset of the data is available at https://phenaid.ardigen.com/jumpcpexplorer/.
One important, however, poorly understood, concept of Traditional Chinese Medicine (TCM) is that of hot, cold, and neutral nature of its bioactive principles. To advance the field, in this study, we analyzed compound-nature pairs from TCM on a large scale (>23 000 structures) via chemical space visualizations to understand its physicochemical domain and in silico target prediction to understand differences related to their modes-of-action (MoA) against proteins. We found that overall TCM natures spread into different subclusters with specific molecular patterns, as opposed to forming coherent global groups. Compounds associated with cold nature had a lower clogP and contain more aliphatic rings than the other groups and were found to control detoxification, heat-clearing, heart development processes, and have sedative function, associated with "Mental and behavioural disorders" diseases. While compounds associated with hot nature were on average of lower molecular weight, have more aromatic ring systems than other groups, frequently seemed to control body temperature, have cardio-protection function, improve fertility and sexual function, and represent excitatory or activating effects, associated with "endocrine, nutritional and metabolic diseases" and "diseases of the circulatory system". Compounds associated with neutral nature had a higher polar surface area and contain more cyclohexene moieties than other groups and seem to be related to memory function, suggesting that their nature may be a useful guide for their utility in neural degenerative diseases. We were hence able to elucidate the difference between different nature classes in TCM on the molecular level, and on a large data set, for the first time, thereby helping a better understanding of TCM nature theory and bridging the gap between traditional medicine and our current understanding of the human body.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.