2023
DOI: 10.1101/2023.01.03.522621
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence-to-sequence translation from mass spectra to peptides with a transformer model

Abstract: A fundamental challenge for any mass spectrometry-based proteomics experiment is the identification of the peptide that generated each acquired tandem mass spectrum. Although approaches that leverage known peptide sequence databases are widely used and effective for well-characterized model organisms, such methods cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to the acquired tandem mass spectra without prior infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
42
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 26 publications
(52 citation statements)
references
References 58 publications
0
42
0
Order By: Relevance
“…Both DeepNovo and PointNovo pass their peak encodings to an LSTM or an output layer to predict the next amino acid. Casanovo frames the problem as a sequenceto-sequence problem [32] and employs a transformer encoder-decoder framework to process and predict sequences of amino acids. Point-Novo and Casanovo were both retrained using their respective official GitHub repositories (github.com/volpato30/DeepNovoV2 for PointNovo, and github.com/Noble-Lab/casanovo for Casanovo).…”
Section: Pointnovo and Casanovo Implementationsmentioning
confidence: 99%
“…Both DeepNovo and PointNovo pass their peak encodings to an LSTM or an output layer to predict the next amino acid. Casanovo frames the problem as a sequenceto-sequence problem [32] and employs a transformer encoder-decoder framework to process and predict sequences of amino acids. Point-Novo and Casanovo were both retrained using their respective official GitHub repositories (github.com/volpato30/DeepNovoV2 for PointNovo, and github.com/Noble-Lab/casanovo for Casanovo).…”
Section: Pointnovo and Casanovo Implementationsmentioning
confidence: 99%
“…Finally, the SoftMax function is used to convert the array Z 20 to an output array P 20 with values between 0 and 1 (eq 4), which is the probability distribution of the AA category. 1 20 PSM Scoring Function in CNovo.…”
Section: ■ Experimental Sectionmentioning
confidence: 99%
“…Casanovo was recently reported to use a transformer architecture for de novo peptide sequencing with higher accuracy than DeepNovo and Novor. 20 We also compared SpliceNovo with Casanovo. In DS1-JP, DS2-JP, and DS3-JP, the peptide recall rate for the top-1 results of SpliceNovo was 1.9, −2.8, and −0.5% higher than that of Casanovo (Figure S12a−c).…”
Section: Analyticalmentioning
confidence: 99%
“…1 Over these decades, numerous algorithmic advances have steadily improved MS data interpretation for sequence identification. 2 The pace of this progress continues unabated along many avenues, including improved interpretation of complex spectra from multiplexed data independent acquisition (plexDIA), 3 de novo sequencing with new embeddings and transformer neural network architectures, 4 improvements in open searches and the identification of peptide modifications, 5 and improved models of isotopic compositions.The latter advances are exemplified by a new approach termed Conditional fragment Ion Distribution Search (CIDS). 6 CIDS can substantially increase sequence identification rates for peptides labeled by using heavy water ( 2 H) or 15 N since such peptides have structural isomers and the distributions of their fragment ions have been difficult to predict.…”
mentioning
confidence: 99%
“…Over these decades, numerous algorithmic advances have steadily improved MS data interpretation for sequence identification . The pace of this progress continues unabated along many avenues, including improved interpretation of complex spectra from multiplexed data independent acquisition (plexDIA), de novo sequencing with new embeddings and transformer neural network architectures, improvements in open searches and the identification of peptide modifications, and improved models of isotopic compositions.…”
mentioning
confidence: 99%