SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra

Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, Hiroshi

doi:10.1093/bioinformatics/bty252

Cited by 28 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A drawback of ADAPTIVE would be interpretability, because structural information is implicitly encoded in compact vectors in ADAPTIVE and cannot be made explicit easily. In metabolite identification, it would be desirable to connect the set of peaks to the corresponding substructures/chemical properties of metabolites (Nguyen et al , 2018b). Developing a model with such interpretability would be interesting future work.…”

Section: Discussionmentioning

confidence: 99%

“…Thus, we can say that kernel-based supervised learning, particularly complex kernels, have a computation issue, regardless of high performance in prediction. On the other hand, a sparse learning model, namely SIMPLE (Nguyen et al , 2018b), considers a simpler function than kernels for fingerprint, while interactions of peaks in spectra can be incorporated into learning models explicitly. SIMPLE achieved a comparable performance against kernel-based learning, reducing the computational cost drastically.…”

Section: Related Workmentioning

confidence: 99%

“…Recent advances in metabolite identification have been led by the machine learning category (e.g. Brouard et al , 2016; Dührkop et al , 2015; Nguyen et al , 2018b). This category can be further divided into two key groups: supervised learning for substructure prediction and unsupervised learning for substructure annotation.…”

Section: Introductionmentioning

confidence: 99%

“…The prediction can be divided into two steps: (i) fingerprint prediction: predicting fingerprints of a given test spectrum with supervised learning; (ii) candidate retrieval: retrieving chemical compound (from database) which is closest to the predicted fingerprints (Nguyen et al , 2018b).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

Nguyen

Mamitsuka

2019

Bioinformatics

Self Cite

View full text Add to dashboard Cite

Motivation Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. Availability and implementation The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

Nguyen

Mamitsuka

2019

Bioinformatics

Self Cite

View full text Add to dashboard Cite

show abstract

“…In recent years numerous powerful approaches (Nguyen et al, 2018a;Schymanski et al, 2017) for annotating MS 2 spectra with a predicted molecular structure have been developed (Ruttkies et al, 2016(Ruttkies et al, , 2019Dührkop et al, 2015;Brouard et al, 2016;Allen et al, 2014;Nguyen et al, 2018bNguyen et al, , 2019Dührkop et al, 2019). Typically, these methods output a ranked list of molecular structure candidates, that can be shown to human experts, or further post-processed, e.g.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Framework for Integration of Mass Spectrum and Retention Time Information in Small Molecule Identification

Bach

Rogers

Williamson

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve the MS based identifications. Results: We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining tandem mass spectrometry data (MS2) and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features have MS2 measurements available besides MS1.

show abstract

Data‐Driven Compound Identification in Atmospheric Mass Spectrometry

Sandström,

Rissanen,

Rousu

et al. 2023

Advanced Science

View full text Add to dashboard Cite

Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time‐of‐flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data‐driven compound identification approaches could alleviate the problem, yet remain rare to non‐existent in atmospheric science. In this perspective, the authors review the current state of data‐driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.

show abstract

SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra

Cited by 28 publications

References 18 publications

ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

Probabilistic Framework for Integration of Mass Spectrum and Retention Time Information in Small Molecule Identification

Data‐Driven Compound Identification in Atmospheric Mass Spectrometry

Contact Info

Product

Resources

About