SUMMARY Melanoma and other cancers harbor oncogenic mutations in the protein kinase B-Raf, which leads to constitutive activation and dysregulation of MAP kinase signaling. In order to elucidate molecular determinants responsible for B-Raf control of cancer phenotypes, we present a method for phosphoprotein profiling, using negative ionization mass spectrometry to detect phosphopeptides based on their fragment ion signature caused by release of PO3−. The method provides an alternative strategy for phosphoproteomics, circumventing affinity enrichment of phosphopeptides and isotopic labeling of samples. Ninety phosphorylation events were regulated by oncogenic B-Raf signaling, based on their responses to treating melanoma cells with MKK1/2 inhibitor. Regulated phosphoproteins included known signaling effectors and cytoskeletal regulators. We investigated MINERVA/FAM129B, a target belonging to a protein family with unknown category and function, and established the importance of this protein and its MAP kinase-dependent phosphorylation in controlling melanoma cell invasion into 3-dimensional collagen matrix.
Correct identification of a peptide sequence from MS/MS data is still a challenging research problem, particularly in proteomic analyses of higher eukaryotes where protein databases are large. The scoring methods of search programs often generate cases where incorrect peptide sequences score higher than correct peptide sequences (referred to as distraction). Because smaller databases yield less distraction and better discrimination between correct and incorrect assignments, we developed a method for editing a peptide-centric database (PC-DB) to remove unlikely sequences and strategies for enabling search programs to utilize this peptide database. Rules for unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no loss of critical information using PC-DBs, validating the methods for generating and searching against the databases. On the other hand, improved confidence in peptide assignments was achieved for tryptic peptides, measured by changes in DeltaCN and RSP. Decreased distraction was also achieved, consistent with the 3-9-fold decrease in database size. Data mining identified a major class of common nonspecific proteolytic products corresponding to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved using the PC-DB approach when compared with conventional searches against protein databases. These results demonstrate that peptide properties can be used to reduce database size, yielding improved accuracy and information capture due to reduced distraction, but with little loss of information compared to conventional protein database searches.
The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies.
A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. Recent advances in genome sequencing, MS instrumentation, and chromatographic methods allow high throughput identification of peptide fragmentation spectra (MS/MS spectra)1 from samples as complex as whole cell extracts (1, 2). For very complex samples, the best results are obtained from ion trap mass spectrometers due to their fast scanning rate and ability to rapidly shift between MS and MS/MS modes during data collection. However, when operated for high data collection rate, mass accuracy and resolution are compromised. It was recognized early on that scores from search programs showed poor discrimination between correct and incorrect sequence assignments when using ion trap MS/MS spectral data to search large protein databases (3). Methods have been developed to specify thresholds for acceptance either by searching datasets against an inverted sequence database of similar size to identify false positive thresholds (4) or by statistical analysis of multiple scores and results from normal searches (4, 5). Using methods such as these, limits on search program scores or combinations of scores can be set to yield a low number of false positives (6), but they also will produce large false negative rates (4, 7). This problem is more acute when larger databases are used.To minimize false negatives, investigators often reduce the acceptance threshold to capture more information. Several methods have been developed to filter the resulting false positives based on agreement between sequence composition of the peptides and their behavior on ion exchange or reverse phase chromatography (7, 8), probability of missed cleavages (9), exact mass measurements (8), or differences in scores between the top ranking peptides and lower ranked candidates (10, 11). Methods have also utilized intensity in- 1 The abbreviations used are: MS/MS spectra, fragmentation spectra; ⌬CN, difference between first and second ranked assignments by XCorr; DTA, text file summary of MS/MS spectral information; sDTA, simplified DTA; IntFrag score, proportion of ion current assigned to internal fragment ions;
Identifying peptides from mass spectrometric fragmentation data (MS/MS spectra) using search strategies that map protein sequences to spectra is computationally expensive. An alternative strategy uses direct spectrum-tospectrum matching against a reference library of previously observed MS/MS that has the advantage of evaluating matches using fragment ion intensities and other ion types than the simple set normally used. However, this approach is limited by the small sizes of the available peptide MS/MS libraries and the inability to evaluate the rate of false assignments. In this study, we observed good performance of simulated spectra generated by the kinetic model implemented in MassAnalyzer We also demonstrate the use of simulated spectra for searching against decoy sequences to estimate false discovery rates. Although we found lower score discrimination with spectrum-to-spectrum searches than with Mascot, particularly for higher charge forms, comparable peptide assignments with low false discovery rate were achieved by examining consensus between X!Hunter and Mascot, filtering results by mass accuracy, and ignoring score thresholds. Protein identification results are comparable to those achieved when evaluating consensus between Sequest and Mascot. Run times with large scale data sets using X!Hunter with the simulated spectral library are 7 times faster than Mascot and 80 times faster than Sequest with the human International Protein Index (IPI) database. We conclude that simulated spectral libraries greatly expand the search space available for spectrum-to-spectrum searching while enabling principled analyses and that the approach can be used in consensus strategies for large scale studies while reducing search times. Identification of proteins in complex samples is a major new area in bioinformatics. The most successful method currently available is shotgun proteomics where proteins are proteolyzed into peptides (usually by trypsin) followed by large scale sequencing of peptides by on-line chromatographic separation and fragmentation in a mass spectrometer (LC-MS/MS). The fragmentation process generates spectra (referred to as MS/MS spectra) from which peptide sequences consistent with the observed fragment ions can be identified (1). A common computational strategy for matching an MS/MS spectrum to a peptide sequence involves interconverting spectral and sequence information (2). Spectrum-to-sequence database search programs match peptide sequences to spectra in one of two ways: by 1) extracting sequence information from an observed spectrum and matching the sequences against peptides contained in a protein database or 2) converting peptide sequences from the protein database into simple spectra (e.g. predicting a subset of possible b and y fragment ions generated by peptide bond cleavage) and matching the predicted fragment ions to those observed. Various scoring methods are then used to evaluate overlap between observed and predicted fragments, including use of probability functions or spectral similar...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.