Almost all large-scale projects in mass spectrometrybased proteomics use trypsin to convert protein mixtures into more readily analyzable peptide populations. When searching peptide fragmentation spectra against sequence databases, potentially matching peptide sequences can be required to conform to tryptic specificity, namely, cleavage exclusively C-terminal to arginine or lysine. In many published reports, however, significant numbers of proteins are identified by non-tryptic peptides. Here we use the sub-parts per million mass accuracy of a new ion trap Fourier transform mass spectrometer to achieve more than a 100-fold increased confidence in peptide identification compared with typical ion trap experiments and show that trypsin cleaves solely C-terminal to arginine and lysine. We find that non-tryptic peptides occur only as the C-terminal peptides of proteins and as breakup products of fully tryptic peptides N-terminal to an internal proline. Simulating lower mass accuracy led to a large number of proteins erroneously identified with non-tryptic peptide hits. Our results indicate that such peptide hits in previous studies should be re-examined and that peptide identification should be based on strict trypsin specificity. Molecular & Cellular Proteomics 3:608 -614, 2004.
Mass spectrometry (MS)1 -based proteomics almost invariably involves the enzymatic degradation of proteins to peptides by trypsin (1). This protease has high cleavage specificity, is very aggressive, and is stable under a wide variety of conditions. Most importantly, cleaving C-terminal to arginine or lysine residues leads to peptides in the preferred mass range for effective fragmentation by tandem mass spectrometry (MS/MS) and places the highly basic residues at the C termini of the peptides. This generally leads to informative high mass y-ion series and makes tandem mass spectra more easily interpretable.When analyzing peptide mixtures by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), a large number of fragmentation events occur. The tandem mass spectra are searched against amino acid sequence databases by one of a number of database search algorithms. The identified peptides receive a score and are combined into lists of identified proteins. A critical question in these experiments is what constitutes a reliable peptide and protein hit (2). Some laboratories save raw mass spectrometric data and interpret this raw data in all questionable cases. In some algorithms, the score is itself a probability and can be used to estimate levels of false positives (incorrect hits) and false negatives (missed hits). For other algorithms, this question has been addressed by analyzing defined mixtures of known proteins (3); or by searching in reversed databases that should not yield significant hits (4, 5). On the basis of these findings, a set of parameters for the scores is often defined that will yield a given trade-off of false positives and false negatives. Recently, more sophisticated statistical learning algorithms have been...