Mass spectrometric based methods for absolute quantification of proteins, such as QconCAT, rely on internal standards of stable-isotope labeled reference peptides, or "Q-peptides," to act as surrogates. Key to the success of this and related methods for absolute protein quantification (such as AQUA) is selection of the Q-peptide. Here we describe a novel method, CONSeQuence (consensus predictor for Q-peptide sequence), based on four different machine learning approaches for Q-peptide selection. CONSeQuence demonstrates improved performance over existing methods for optimal Q-peptide selection in the absence of prior experimental information, as validated using two independent test sets derived from yeast. Furthermore, we examine the physicochemical parameters associated with good peptide surrogates, and demonstrate that in addition to charge and hydrophobicity, peptide secondary structure plays a significant role in determining peptide "detectability" in liquid chromatography-electrospray ionization experiments. We relate peptide properties to protein tertiary structure, demonstrating a counterintuitive preference for buried status for frequently detected peptides. Finally, we demonstrate the improved efficacy of the general approach by applying a predictor trained on yeast data to sets of proteotypic peptides from two additional species taken from an existing peptide identification repository.Molecular & Cellular Proteomics 10: 10.1074/mcp.M110.003384, 1-12, 2011.The study of cellular systems via identification of their protein components is becoming almost commonplace with the continuing advances in mass spectrometric hardware and associated analytical software. The current drive is now to assign accurate quantitative information to these protein components, such that the data can be used in systems modeling studies. For these models to be effective and the dynamics of these biochemical systems simulated during activation or perturbation, absolute rather than relative quantitative information must be provided (1). Even in the absence of sophisticated modeling studies, evaluating both the qualitative and quantitative aspects of biological networks can permit understanding of the complex interplay of system components, as well as the identification of disease biomarkers (2-4).Given the dependence of mass spectrometric signal intensity on the nature of the analyte, methods for absolute protein quantification primarily rely on standardization of signal intensity with known quantities of isotope-labeled references that are identical in primary structure to the analyte (5-7). In a typical such proteomics experiment, quantification is performed at the peptide level, often by virtue of selected reaction monitoring experiments, using defined amounts of pure isotope-labeled tryptic peptide as reference for unknown quantities of the tryptic hydrolysate of the protein of interest. Protein amount is subsequently inferred. This can also be combined with label-free approaches to infer absolute peptide quantifications using a...
In this paper, we discuss the challenge of large-scale quantification of a proteome, referring to our programme that aims to define the absolute quantity, in copies per cell, of at least 4000 proteins in the yeast Saccharomyces cerevisiae. We have based our strategy on the well-established method of stable isotope dilution, generating isotopically labelled peptides using QconCAT technology, in which artificial genes, encoding concatenations of tryptic fragments as surrogate quantification standards, are designed, synthesised de novo and expressed in bacteria using stable isotopically enriched media. A known quantity of QconCAT is then co-digested with analyte proteins and the heavy:light isotopologues are analysed by mass spectrometry to yield absolute quantification. This workflow brings issues of optimal selection of quantotypic peptides, their assembly into QconCATs, expression, purification and deployment.
Quantitative proteomics experiments are usually performed using proteolytic peptides as surrogates for their parent proteins, inferring protein amounts from peptide-level quantitation. This process is frequently dependent on complete digestion of the parent protein to its limit peptides so that their signal is truly representative. Unfortunately, proteolysis is often incomplete, and missed cleavage peptides are frequently produced that are unlikely to be optimal surrogates for quantitation, particularly for label-mediated approaches seeking to derive absolute values. We have generated a predictive computational tool that is able to predict which candidate proteolytic peptide bonds are likely to be missed by the standard enzyme trypsin. Our cross-validated prediction tool uses support vector machines and achieves high accuracy in excess of 0.94 precision (PPV), with attendant high sensitivity of 0.79, across multiple proteomes. We believe this is a useful tool for selecting candidate quantotypic peptides, seeking to minimize likely loss owing to missed cleavage, which will be a boon for quantitative proteomic pipelines as well as other areas of proteomics. Our results are discussed in the context of recent results examining the kinetics of missed cleavages in proteomic digestion protocols, and show agreement with observed experimental trends. The software has been made available at http://king.smith.man.ac.uk/mcpred.
Defining intracellular protein concentration is critical in molecular systems biology. Although strategies for determining relative protein changes are available, defining robust absolute values in copies per cell has proven significantly more challenging. Here we present a reference data set quantifying over 1800 Saccharomyces cerevisiae proteins by direct means using protein-specific stable-isotope labeled internal standards and selected reaction monitoring (SRM) mass spectrometry, far exceeding any previous study. This was achieved by careful design of over 100 QconCAT recombinant proteins as standards, defining 1167 proteins in terms of copies per cell and upper limits on a further 668, with robust CVs routinely less than 20%. The selected reaction monitoring-derived proteome is compared with existing quantitative data sets, highlighting the disparities between methodologies. Coupled with a quantification of the transcriptome by RNA-seq taken from the same cells, these data support revised estimates of several fundamental molecular parameters: a total protein count of ∼100 million molecules-per-cell, a median of ∼1000 proteins-per-transcript, and a linear model of protein translation explaining 70% of the variance in translation rate. This work contributes a “gold-standard” reference yeast proteome (including 532 values based on high quality, dual peptide quantification) that can be widely used in systems models and for other comparative studies.
Background: The control of gene expression in eukaryotic cells occurs both transcriptionally and posttranscriptionally. Although many genes are now known to be regulated at the translational level, in general, the mechanisms are poorly understood. We have previously presented polysomal gradient and array-based evidence that translational control is widespread in a significant number of genes when yeast cells are exposed to a range of stresses. Here we have re-examined these gene sets, considering the role of UTR sequences in the translational responses of these genes using recent large-scale datasets which define 5' and 3' transcriptional ends for many yeast genes. In particular, we highlight the potential role of 5' UTRs and upstream open reading frames (uORFs).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.