Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.
Genome‐, transcriptome‐ and proteome‐wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein‐level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNA s, that few proteins show tissue‐specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.
Plants are indispensable for life on earth and represent organisms of extreme biological diversity with unique molecular capabilities 1. Here, we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. It provides initial answers to how many genes exist as proteins (>18,000), where they are expressed, in which approximate quantities (>6 orders of magnitude dynamic range) and to what extent they are phosphorylated (>43,000 sites). We present examples for how the data may be used, for instance, to discover proteins translated from short open reading frames, to uncover sequence motifs involved in protein expression regulation, to identify tissue-specific protein complexes or phosphorylation-mediated signaling events to name a few. Interactive access to this unique resource for the plant community is provided via ProteomicsDB and ATHENA which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interplay. Main The plant model organism Arabidopsis thaliana (AT) has revolutionized our understanding of plant biology and influenced many other areas of the life sciences 1. Knowledge derived from Arabidopsis has also provided mechanistic understanding of important agronomic traits in crop species 2. The Arabidopsis genome was sequenced 20 years ago and hundreds of natural variants have since been analyzed at the genome and epigenome level 3,4. In contrast, the Arabidopsis proteome as the main executer of most biological processes is far less comprehensively characterized. To address this gap, we used state-of-the-art mass spectrometry and RNA sequencing (RNA-seq) to provide the first integrated proteomic, phosphoproteomic and transcriptomic atlas of Arabidopsis. Illustrated by selected examples, we show how this rich molecular resource can be used to explore the function of single proteins or entire pathways across multiple omics levels. Multi-omics atlas of Arabidopsis We generated an expression atlas covering, on average, 17,603 ± 1,317 transcripts, 14,430 ± 911 proteins and 14,689 ± 2,509 phosphorylation sites (p-sites) per tissue, using a reproducible biochemical and analytical approach (Fig. 1a,b; Extended Data Fig. 1a-c; Supplementary Data 1,2). In total, the protein expression data covers 18,210 of the 27,655 protein-coding genes (66%) annotated in Araport11 5. This is a substantial increase compared to the percentage of genes with protein level evidence reported in UniProt (27%) 6 and more than double the number of proteins identified in an earlier tissue proteome analysis 7 (Fig. 1c, Extended Data Fig. 1d-f). In addition, we report tissue-resolved quantitative evidence for a total of 43,903 p-sites making this study the most comprehensive single Arabidopsis phosphoproteome published to date (Fig. 1c). 47% of the expressed proteome was found to be phosphorylated in at least one instance, confirming earlier analyses of individual
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.