Comprehensive knowledge of proteome complexity is crucial to understanding cell function. Amino termini of yeast proteins were identified through peptide mass spectrometry on glutaraldehyde-treated cell lysates as well as a parallel assessment of publicly-deposited spectra. An unexpectedly large fraction of detected amino-terminal peptides (35%) mapped to translation initiation at AUG codons downstream of the annotated start codon. Many of the implicated genes have suboptimal sequence contexts for translation initiation near their annotated AUG, and their ribosome profiles show elevated tag densities consistent with translation initiation at downstream AUGs as well as their annotated AUGs. These data suggest that a significant fraction of the yeast proteome derives from initiation at downstream AUGs, increasing significantly the repertoire of encoded proteins and their potential functions and cellular localizations.
Peptide
mass spectrometry relies crucially on algorithms that match
peptides to spectra. We describe a method to evaluate the accuracy
of these algorithms based on the masses of parent proteins before
trypsin endoprotease digestion. Measurement of conformance to parent
proteins provides a score for comparison of the performances of different
algorithms as well as alternative parameter settings for a given algorithm.
Tracking of conformance scores for spectrum matches to proteins with
progressively lower expression levels revealed that conformance scores
are not uniform within data sets but are significantly lower for less
abundant proteins. Similarly peptides with lower algorithm peptide-spectrum
match scores have lower conformance. Although peptide mass spectrometry
data is typically filtered through decoy analysis to ensure a low
false discovery rate, this analysis confirms that the filtered data
should not be considered as having a uniform confidence. The analysis
suggests that use of different algorithms and multiple standardized
parameter settings of these algorithms can increase significantly
the numbers of peptides identified. This data set can be used as a
resource for future algorithm assessment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.