pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning

Zhou, Xie-Xuan; Zeng, Wen‐Feng; Chi, Hao; Luo, Chunjie; Liu, Chao; Zhan, Jianfeng; He, Shan; Zhang, Zhifei

doi:10.1021/acs.analchem.7b02566

Cited by 209 publications

(224 citation statements)

References 22 publications

Supporting

Mentioning

222

Contrasting

Order By: Relevance

“…We reasoned that a combination of very large and consistent data sets acquired by PASEF with state of the art deep learning methods would address both challenges. Due to their inherent flexibility and their ability to scale to large data sets, deep learning methods have proven very successful in genomics 30,31 and more recently in proteomics for the prediction of retention times and fragmentation spectra [32][33][34][35] .…”

mentioning

confidence: 99%

Deep learning the collisional cross sections of the peptide universe from a million training samples

Meier

Brunner

et al. 2020

Preprint

View full text Add to dashboard Cite

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To explore the nature and utility of the entire peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation -serial fragmentation (PASEF). The scale and precision (CV <1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools library validate the model within a 1.3% median relative error (R > 0.99). Hydrophobicity, position of prolines and histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

show abstract

mentioning

confidence: 99%

Deep learning the collisional cross sections of the peptide universe from a million training samples

Meier

Brunner

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Training on simulated data or domain randomization Tobin et al (2017) is a powerful method to improve performance for models and reducing the generalization gap. The prediction of relative intensity profiles is well received in the proteomics community as it improves database search substantially Gabriels et al (2019); Zhou et al (2017); Gessulat et al (2019). This in combination with a sophisticated noise model may become very helpful for domain randomization and training models like AHLF in the near future.…”

Section: Discussionmentioning

confidence: 99%

AHLF: ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides

Altenburg

Wang

Muth

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation:Publicly available mass spectrometry-based proteomics data has grown exponentially in the recent past. Yet, large scale spectrum-centered analysis usually involves predefined fragmentation features that are limited and prone to be biased. Using deep learning, the decision making for a suitable fragmentation model can be carried out in a data-driven manner. Results: We introduce a framework that allows end-to-end training of generic deep learning models on a large collection of high resolution tandem mass spectra. In this case we used 19.2 million labeled spectra from more than a hundred individual PRIDE repositories. In our framework, we developed a representation that captures the complete information of a high-resolution spectrum facilitating a loss-less reduction of the number of features largely independent of the actual resolution. Additionally, it allows us to use common trainable layers, e.g. recurrent or convolutional operations. Specifically, we use a deep network of stacked dilated convolutions to model long range associations between any peaks within a tandem mass spectrum. We exemplify our approach by learning to detect post-translational modifications -in this case, protein phosphorylation -only based on a given mass spectrum in a fully data-driven manner. To the best of our knowledge, this is the first end-to-end trained deep learning model on tandem spectra that is able to ad hoc learn fragmentation patterns in high-resolution spectra. Our approach outperforms the current state-of-the-art in predicting if a mass spectrum originates from a phosphorylated peptide. Availability: Our deep learning framework is implemented in tensorflow. The open source code including trained weights is available at gitlab.com/dacs-hpi/ahlf Contact: bernhard.renard@hpi.de

show abstract

“…Before constructing a virtual spectral library, we tested the performance of several deep learning models to predict fragment ion intensities and retention time indices (iRT) for the 415 GPCR peptide precursors from the initial DIA spectral library. Distinct from the aforementioned wholeproteome virtual library approaches, we here used the deep neutral network-based models pDeep (Zhou et al, 2017) to predict fragment ion intensities and DeepRT (Ma et al, 2018) to predict iRT from GPCR peptide sequences (Fig. 1).…”

Section: Constructing a Gpcr-targeted Virtual Library With Re-trainedmentioning

confidence: 99%

A hybrid spectral library combining DIA-MS data and a targeted virtual library substantially deepens the proteome coverage

Lou

Tang

Ding

et al. 2020

Preprint

View full text Add to dashboard Cite

Data-independent acquisition mass spectrometry (DIA-MS) is a rapidly evolving technique that enables relatively deep proteomic profiling with superior quantification reproducibility. DIA data mining predominantly relies on a spectral library of sufficient proteome coverage that, in most cases, is built on data-dependent acquisition-based analysis of the same sample. To expand the proteome coverage for a pre-determined protein family, we report herein on the construction of a hybrid spectral library that supplements a DIA experiment-derived library with a protein familytargeted virtual library predicted by deep learning. Leveraging this DIA hybrid library substantially deepens the coverage of three transmembrane protein families (G protein coupled receptors; ion channels; and transporters) in mouse brain tissues with increases in protein identification of 37-87%, and peptide identification of 58-161%. Moreover, of the 412 novel GPCR peptides exclusively identified with the DIA hybrid library strategy, 53.6% were validated as present in mouse brain tissues based on orthogonal experimental measurement.

show abstract

pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning

Cited by 209 publications

References 22 publications

Deep learning the collisional cross sections of the peptide universe from a million training samples

Deep learning the collisional cross sections of the peptide universe from a million training samples

AHLF: ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides

A hybrid spectral library combining DIA-MS data and a targeted virtual library substantially deepens the proteome coverage

Contact Info

Product

Resources

About