2022
DOI: 10.1101/2022.02.07.479481
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

De novo mass spectrometry peptide sequencing with a transformer model

Abstract: Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids -- the peptide -- responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
48
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 47 publications
(66 citation statements)
references
References 26 publications
(63 reference statements)
0
48
0
Order By: Relevance
“…However, in the last couple of years, a structure first in natural language processing 15 known as Transformers 16 has successfully been employed within bioinformatics, e.g., structure prediction, 17 , 18 gene expression prediction, 19 and even within MS-based proteomics, e.g., peptide detection problem, 20 DIA library generation for the phosphoproteome, 21 and de novo interpretation of MS2 spectra. 22 …”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, in the last couple of years, a structure first in natural language processing 15 known as Transformers 16 has successfully been employed within bioinformatics, e.g., structure prediction, 17 , 18 gene expression prediction, 19 and even within MS-based proteomics, e.g., peptide detection problem, 20 DIA library generation for the phosphoproteome, 21 and de novo interpretation of MS2 spectra. 22 …”
Section: Introductionmentioning
confidence: 99%
“…However, in the last couple of years, a structure first in natural language processing 15 known as Transformers 16 has success-fully been employed within bioinformatics, e.g., structure prediction, 17,18 gene expression prediction, 19 and even within MS-based proteomics, e.g., peptide detection problem, 20 DIA library generation for the phosphoproteome, 21 and de novo interpretation of MS2 spectra. 22 Transformers are, like RNNs, designed to handle sequential input data and do so through attention mechanisms, i.e., mechanisms that enhance the essential parts of the input sequence for its output. However, unlike RNNs, the Transformers do not use recurrence, thus enabling a significant speed-up by parallelizing their training.…”
Section: ■ Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Cysteine carbamidomethylation was set as a fixed modification, and variable modifications were methionine oxidation, N-terminal acetylation, N-terminal carbamylation, pyroglutamate formation from glutamine, and deamidation of asparagine and glutamine. MSGF+ was configured to allow one 13 C precursor mass isotope, at most one non-tryptic terminus, and 10 ppm precursor mass tolerance. The searches were individually filtered at 1% PSM-level FDR.…”
Section: Methods and Resultsmentioning
confidence: 99%
“…This dovetails with visualization functionality to produce publication-quality and interactive spectrum graphics. Since its introduction, spectrum_utils has been used in tools that perform spectral library searching 10 and spectrum clustering, 11 to preprocess MS/MS spectra prior to deep learning applications, 12,13 to plot MS/MS spectra from online data repositories, 14 and to assist in MS/MS processing and visualization efforts for dozens of other projects. [15][16][17][18][19][20][21][22] Here we present recent updates to spectrum_utils.…”
Section: Introductionmentioning
confidence: 99%