Intensity-Based Statistical Scorer for Tandem Mass Spectrometry

Havilio, Moshe; Haddad, Yariv; Smilansky, Zeev

doi:10.1021/ac0258913

Cited by 161 publications

(138 citation statements)

References 22 publications

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…The very high percentage of spectra correctly assigned by Prospector in this study (over two-thirds) is in contrast to most previously published dataset of high throughput ion trap data where between 5 and 15% of the acquired spectra could be interpreted (2,3,26), although one study has reported 40% identification (25). This is unlikely to be a reflection of the reliability of results searched with different search engines but rather a measure of the relative quality of data acquired on a QqTOF instrument in comparison to that acquired on an ion trap both in terms of mass accuracy and the presence of a full mass range in the fragmentation spectra.…”

Section: Figcontrasting

confidence: 93%

Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer

Chalkley

Baker

Huang

et al. 2005

Molecular & Cellular Proteomics

182

114

View full text Add to dashboard Cite

A thorough analysis of the protein interaction partners of the yeast GTPase Gsp1p was carried out by a multidimensional chromatography strategy of strong cation exchange fractionation of peptides followed by reverse phase LC-ESI-MSMS using a QSTAR instrument. This dataset was then analyzed using the latest developmental version of Protein Prospector. The Prospector search results were also compared with results from the search engine "Mascot" using a new results comparison program within Prospector named "SearchCompare." The results from this study demonstrate that the high quality data produced on a quadrupole selecting, quadrupole collision cell, time-of-flight (QqTOF) geometry instrument allows for confident assignment of the vast majority of interpretable spectra by current search engines. Modern mass spectrometers are able to produce large amounts of information-rich data in relatively short periods of time. The bottleneck in mass spectrometry-based peptide and protein identification is now at the stage of data analysis and verification of results. There are several search engines available that can analyze large datasets in a batch fashion, most notably Mascot (www.matrixscience.com) and Sequest (1). Although it would be desirable to be able to quote results from such searches without a need to look at and evaluate the raw data, this is not without risk at the moment as although both use probability-based scoring systems, the reliability of results from Sequest are known to be problematic (2, 3), and no extensive study of the performance of Mascot on large datasets has been published. Hence a number of groups have developed statistical analysis programs for evaluating these search results to be able to better define the reliability of the reported matches (4 -7). In addition, the data analyzed by these search engines are not the raw data but rather peak centroided mass lists extracted from the raw data that do not always fully represent the information content in the raw data. A summary of the complications arising from automated peptide and protein identification has recently been published (8).Protein Prospector contains a suite of programs developed at University of California, San Francisco that is used for analysis of proteomic data (www.prospector.ucsf.edu). Historically it has been one of the major programs in proteomic analysis; however, the current web version (version 4.0.5) does not have the ability to analyze multiple MSMS spectra simultaneously in a batch fashion. Thus, its current use in analyzing large datasets is limited. Hence we have developed new programs within the Prospector framework specifically designed for large dataset analysis and comparison. The first of these is "Batch Tag," which is based on the well established MS-Tag program but is able to analyze files containing large numbers of spectra from one or multiple sample fractions.A new program within Protein Prospector called "SearchCompare" has been developed that is able to summarize and filter large dataset results. It also c...

show abstract

Section: Figcontrasting

confidence: 93%

Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer

Chalkley

Baker

Huang

et al. 2005

Molecular & Cellular Proteomics

182

114

View full text Add to dashboard Cite

show abstract

“…A number of algorithms and scoring models have been developed to assess the likelihood of a match. They can show different selectivity and sensitivity at the edge of good spectral quality, and some programs have enough flexibility to permit the use of different types of MS/ MS data or modification patterns 6,7,[40][41][42][43][44][45][46][47][48][49][50][51][52][53] . Four basic approaches have been developed to model matches to sequences: descriptive, interpretative, stochastic and probability-based modeling (Box 2).…”

Section: Review Of Database Search Algorithmsmentioning

confidence: 99%

“…Statistical and probability models for database searching This group of methods uses models based on empirically generated fragment ion probabilities 45,48,51 . In these methods no a priori determined probabilities are used.…”

Section: Box 3 Strategy For Large Scale Data Analysismentioning

confidence: 99%

“…Thus, in the simplest models the frequencies of matches of b-and y-ions are determined and used to calculate a probability of sequence identification determined by the product of probabilities of its fragment matches. Several variations of this approach have been implemented in database searching algorithms 43,45,48,51 . Mascot 41 uses a model analogous to the one previously developed for identifying proteins from their peptide mass fingerprint 3 .…”

Section: Box 3 Strategy For Large Scale Data Analysismentioning

confidence: 99%

See 1 more Smart Citation

Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book

2004

View full text Add to dashboard Cite

Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.An unintended consequence of whole-genome sequencing has been the birth of large-scale proteomics. What drives proteomics is the ability to use mass spectrometry data of peptides as an 'address' or 'zip code' to locate proteins in sequence databases. Two mass spectrometry methods are used to identify proteins by database search methods. The first method uses a molecular weight fingerprint measured from a protein digested with a site-specific protease [1][2][3][4][5] . A second method uses tandem mass spectra derived from individual peptides of a digested protein 6,7 (Fig. 1). Because each tandem mass spectrum represents an independent and verifiable piece of data, this approach to database searching has the ability to identify proteins in mixtures, enabling a rapid and comprehensive approach for the analysis of protein complexes and other complicated mixtures of proteins 6,[8][9][10][11][12] . New biology has been discovered based on fast and accurate protein identification [13][14][15][16][17][18] . As tandem mass spectral protein identification has proliferated, it has become increasingly important to understand the rationale of individual database search algorithms, their relative strengths and weaknesses, and the mathematics used to match sequence to spectrum.In this review we discuss the prevailing fragmentation models, spectral preprocessing, methods to match tandem mass spectra to sequences and several approaches to matching tandem mass spectra of peptides whose exact sequences may not be present in the database. Space limitations restrict a detailed description of all algorithms in this rapidly expanding field. Also, some algorithms are proprietary, and thus, details on how they work are unknown. This review should supplement and update earlier reviews on database search algorithms [19][20][21][22][23][24] . Peptide fragmentation and data preprocessingIn tandem mass spectrometry (MS/MS), gas phase peptide ions undergo collision-induced dissociation (CID) with molecules of an inert gas such as helium or argon 25 . Other methods of dissociation have been developed, such as electron capture dissociation (ECD), surface induced dissociation (SID) and electron transfer dissociation (ETD), but gas-phase CID is the most widely used in commercial tandem mass spectrometers. The dissociation pathways are strongly dependent on the collision energy, but the vast majority of instruments use low-energy CID (<100 eV) 26 ....

show abstract

“…Since collision-activated dissociation (CAD) was performed by an LTQ mass spectrometer we expect to find the typical abundant fragments such as y-and b-peaks and their derivatives [9,36,37,16]. However, with FT-ICR it is possible to detect rare ion-fragments, which could not be identified with lower resolution instruments since they would be indistinguishable from noise (see for example analysis on similar data with low resolution instruments [38]).…”

Section: Fourier Transform Mass Spectrometry and Peptide Fragmentationmentioning

confidence: 99%

De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry

et al. 2006

View full text Add to dashboard Cite

The recent proliferation of novel mass spectrometers such as Fourier-Transform, Qtof and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a two orders of magnitude boost to the mass resolution, as compared to low precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup it is not only possible to search a database very efficiently, but it is also opens the possibility for using the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.

show abstract

Intensity-Based Statistical Scorer for Tandem Mass Spectrometry

Cited by 161 publications

References 22 publications

Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer

Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer

Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book

De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry

Contact Info

Product

Resources

About