2021
DOI: 10.1101/2021.06.25.449969
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MassGenie: a transformer-based deep learning method for identifying small molecules from their mass spectra

Abstract: The ′inverse problem′ of mass spectrometric molecular identification (′given a mass spectrum, calculate the molecule whence it came′) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (′calculate a small molecule′s likely fragmentation and hence at least some of its mass spectrum from its structure a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 90 publications
0
14
0
Order By: Relevance
“…CSI:FingerID utilizes fragmentation trees that best explain the spectra (31,32) to improve mapping candidates to their corresponding fingerprints (9,33). In addition to learning to rank, we expect auxiliary information in various forms, such as biochemical (34) data or data augmentation (10) to improve annotation.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…CSI:FingerID utilizes fragmentation trees that best explain the spectra (31,32) to improve mapping candidates to their corresponding fingerprints (9,33). In addition to learning to rank, we expect auxiliary information in various forms, such as biochemical (34) data or data augmentation (10) to improve annotation.…”
Section: Discussionmentioning
confidence: 99%
“…L = −cos(ŷ, y) [10] where PRED(•) is neural network that predicts spectra from molecular representation z, and cos(•, •) is the cosine similarity between the predicted spectra, ŷ, and the query spectra, y. For all PRED(•) functions, we applied a two-layer MLP augmented with bidirectional prediction mode (17), which increases the prediction accuracy on the larger fragments that arise due to neutral losses.…”
Section: Mlp-and Gnn-based Molecular Encodingmentioning
confidence: 99%
See 1 more Smart Citation
“…Future work will seek to optimize and tune the model itself. Current deep learning methods that predict information from the mass spectra use binned spectra and process them with multilayer perceptrons 20,31 , word2vec algorithm inspired by natural language processing 32 or a transformer architecture 57 . A binned spectrum has an obvious drawback of reducing resolution of the original spectrum, losing sensitivity provided by the latest generation of mass spectrometers.…”
Section: Discussionmentioning
confidence: 99%
“…We believe that it represents the first use of transformers in molecular identification from (mass) spectra. A preprint (dated 26 June 2021) has been lodged [ 84 ].…”
Section: Introductionmentioning
confidence: 99%