Small molecules are usually compared by their chemical structure, but there is no unified analytic framework for representing and comparing their biological activity. We present the Chemical Checker (CC), which provides processed, harmonized and integrated bioactivity data on ~800,000 small molecules. The CC divides data into five levels of increasing complexity, from the chemical properties of compounds to their clinical outcomes. In between, it includes targets, off-targets, networks and cell-level information, such as omics data, growth inhibition and morphology. Bioactivity data are expressed in a vector format, extending the concept of chemical similarity to similarity between bioactivity signatures. We show how CC signatures can aid drug discovery tasks, including target identification and library characterization. We also demonstrate the discovery of compounds that reverse and mimic biological signatures of disease models and genetic perturbations in cases that could not be addressed using chemical information alone. Overall, the CC signatures facilitate the conversion of bioactivity data to a format that is readily amenable to machine learning methods.
Prion-like behavior has been in the spotlight since it was first associated with the onset of mammalian neurodegenerative diseases. However, a growing body of evidence suggests that this mechanism could be behind the regulation of processes such as transcription and translation in multiple species. Here, we perform a stringent computational survey to identify prion-like proteins in the human proteome. We detected 242 candidate polypeptides and computationally assessed their function, protein–protein interaction networks, tissular expression, and their link to disease. Human prion-like proteins constitute a subset of modular polypeptides broadly expressed across different cell types and tissues, significantly associated with disease, embedded in highly connected interaction networks, and involved in the flow of genetic information in the cell. Our analysis suggests that these proteins might play a relevant role not only in neurological disorders, but also in different types of cancer and viral infections.
Supplementary data are available at Bioinformatics online.
We present the Chemical Checker (CC), a resource that provides processed, harmonized and integrated bioactivity data on 800,000 small molecules. The CC divides data into five levels of increasing complexity, ranging from the chemical properties of compounds to their clinical outcomes. In between, it considers targets, off-targets, perturbed biological networks and several cell-based assays such as gene expression, growth inhibition and morphological profilings. In the CC, bioactivity data are expressed in a vector format, which naturally extends the notion of chemical similarity between compounds to similarities between bioactivity signatures of different kinds. We show how CC signatures can boost the performance of drug discovery tasks that typically capitalize on chemical descriptors, including target identification and library characterization. Moreover, we demonstrate and experimentally validate that CC signatures can be used to reverse and mimic biological signatures of disease models and genetic perturbations, options that are otherwise impossible using chemical information alone. an organic molecule (A: Chemistry) that interacts with one or several protein receptors (B: Targets), triggering perturbations of biological pathways (C: Networks) and eliciting phenotypic outcomes that can be measured in e.g. cell-based assays (D: Cells) before delivery to patients (E: Clinics). Using these five categories, we classified the information stored in major compound databases, including chemogenomics resources, cell-based screens and, when available, clinical reports of drug effects (Methods).We then divided each level (A-E) into five sublevels (1-5) corresponding to distinct types or scopes of the data. In total, the CC contains 25 well-defined categories meant to illustrate the most relevant aspects of small molecule characterization. In particular, we stored the 2D (A1) and 3D (A2) structures of compounds, together with their scaffolds (A3), functional groups (A4) and physicochemistry (A5). We also retrieved therapeutic targets (B1) and drug metabolizing enzymes (B2), and molecules co-crystallized with protein chains (B3). We fetched literature binding data (B4) from major chemogenomics databases, and high-throughput target screening results (B5). Moving to a higher order of biology, we looked for ontological classifications of compounds (C1) and focused on human metabolites in a genomescale metabolic network (C2). In addition, we kept the pathways (C3), biological processes (C4) and protein-protein interactions (C5) of the previously collected binding data. To capture cell-level information, we gathered differential gene expression profiles (D1) and compound growth-inhibition potencies across cancer cell lines (D2). Similarly, we gathered sensitivity profiles over an array of yeast mutants (chemical genetics) (D3), as well as cell morphology changes (high-content screening) (D4). Additional cell sensitivity data available from the literature were also collected (D5). To organize clinical data, we used the traditio...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.