2020
DOI: 10.21105/joss.02411
|View full text |Cite
|
Sign up to set email alerts
|

matchms - processing and similarity evaluation of mass spectrometry data.

Abstract: Mass spectrometry data is at the heart of numerous applications in the biomedical and life sciences. With growing use of high-throughput techniques, researchers need to analyze larger and more complex datasets. In particular through joint effort in the research community, fragmentation mass spectrometry datasets are growing in size and number. Platforms such as MassBank (Horai et al., 2010), GNPS (Wang et al., 2016) or MetaboLights (Haug et al., 2020) serve as an open-access hub for sharing of raw, processed, … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
102
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 76 publications
(102 citation statements)
references
References 16 publications
0
102
0
Order By: Relevance
“…Finally, metabolomics is increasingly used as a tool to understand metabolic profiles and to perform integrative systems biology; furthermore, the use of MS/MS data has been promoted by community based platforms such as MassBank [6], MS2LDA [10] and GNPS [8]. We would like the community to use our novel score and therefore we created the modular matchms [31] and Spec2Vec [32] packages that can easily be incorporated in platforms such as GNPS. Currently, as a start, GNPS users can calculate Spec2Vec scores for spectra in their molecular networks of positive ionisation mode datasets using a pretrained model (see S5 Text which includes links to molecular networks created with release 27; Spec2Vec networks are available for both classical and feature-based [33] molecular network jobs from MolNetEnhancer [34] and GNPS example datasets).…”
Section: Discussionmentioning
confidence: 99%
“…Finally, metabolomics is increasingly used as a tool to understand metabolic profiles and to perform integrative systems biology; furthermore, the use of MS/MS data has been promoted by community based platforms such as MassBank [6], MS2LDA [10] and GNPS [8]. We would like the community to use our novel score and therefore we created the modular matchms [31] and Spec2Vec [32] packages that can easily be incorporated in platforms such as GNPS. Currently, as a start, GNPS users can calculate Spec2Vec scores for spectra in their molecular networks of positive ionisation mode datasets using a pretrained model (see S5 Text which includes links to molecular networks created with release 27; Spec2Vec networks are available for both classical and feature-based [33] molecular network jobs from MolNetEnhancer [34] and GNPS example datasets).…”
Section: Discussionmentioning
confidence: 99%
“…A large set of MS/MS spectra was retrieved from GNPS 17 and subsequently curated and cleaned using matchms 19 (see Methods). The resulting training data set contains smiles/InChI annotations for 109,734 spectra, which allowed us to create molecular fingerprints to quantify structural similarities of spectral pairs.…”
Section: Resultsmentioning
confidence: 99%
“…The dataset was retrieved from GNPS (25/01/2021) and contains a total of 210,407 MS/MS spectra. Metadata was cleaned and checked using matchms 19 version 0.8.2, which included cleaning compound names, extracting adduct information from the given metadata, moving metadata to consistent fields and conversions between InChI and SMILES as well as to InChIKeys when missing and when possible. We then ran an automated search against PubChem 35 using pubchempy 36 for spectra which still missed InChI or SMILES annotations.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Modified cosine spectral similarity. The MatchMS python package version 0.6.2 was used to calculate the modified cosine spectral similarity score and determine the number of matching ions 17 . Each spectrum was square-root intensity scaled and further normalized using the MatchMS function "normalize_intensities".…”
Section: Methodsmentioning
confidence: 99%