Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.
Kinase inhibitors are important cancer therapeutics. Polypharmacology is commonly observed, requiring thorough target deconvolution to understand drug mechanism of action. Using chemical proteomics, we analyzed the target spectrum of 243 clinically evaluated kinase drugs. The data revealed previously unknown targets for established drugs, offered a perspective on the "druggable" kinome, highlighted (non)kinase off-targets, and suggested potential therapeutic applications. Integration of phosphoproteomic data refined drug-affected pathways, identified response markers, and strengthened rationale for combination treatments. We exemplify translational value by discovering SIK2 (salt-inducible kinase 2) inhibitors that modulate cytokine production in primary cells, by identifying drugs against the lung cancer survival marker MELK (maternal embryonic leucine zipper kinase), and by repurposing cabozantinib to treat FLT3-ITD-positive acute myeloid leukemia. This resource, available via the ProteomicsDB database, should facilitate basic, clinical, and drug discovery research and aid clinical decision-making.
Genome‐, transcriptome‐ and proteome‐wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein‐level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNA s, that few proteins show tissue‐specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.
The intestinal microbiota is known to regulate host energy homeostasis and can be influenced by highcalorie diets. However, changes affecting the ecosystem at the functional level are still not well characterized. We measured shifts in cecal bacterial communities in mice fed a carbohydrate or high-fat (HF) diet for 12 weeks at the level of the following: (i) diversity and taxa distribution by high-throughput 16S ribosomal RNA gene sequencing; (ii) bulk and single-cell chemical composition by Fourier-transform infrared-(FT-IR) and Raman micro-spectroscopy and (iii) metaproteome and metabolome via highresolution mass spectrometry. High-fat diet caused shifts in the diversity of dominant gut bacteria and altered the proportion of Ruminococcaceae (decrease) and Rikenellaceae (increase). FT-IR spectroscopy revealed that the impact of the diet on cecal chemical fingerprints is greater than the impact of microbiota composition. Diet-driven changes in biochemical fingerprints of members of the Bacteroidales and Lachnospiraceae were also observed at the level of single cells, indicating that there were distinct differences in cellular composition of dominant phylotypes under different diets. Metaproteome and metabolome analyses based on the occurrence of 1760 bacterial proteins and 86 annotated metabolites revealed distinct HF diet-specific profiles. Alteration of hormonal and anti-microbial networks, bile acid and bilirubin metabolism and shifts towards amino acid and simple sugars metabolism were observed. We conclude that a HF diet markedly affects the gut bacterial ecosystem at the functional level.
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ϳ19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb. org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software. Molecular & Cellular
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.