We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.Given its ease of use and low operational cost, GC-MS has applications with broad societal effect, such as detection of metabolic disease in newborns, toxicology, doping, forensics, food science and clinical testing. The predominant ionization technique in GC-MS is electron ionization (EI), in which all compounds are ionized by high-energy (70-eV) electrons. Because fragmentation occurs with ionization, EI GC-MS data are subjected to spectral deconvolution, a process that separates fragmentation ion patterns for each eluting molecule into a composite mass spectrum.The 70 eV for ionizing electrons in GC-MS has been the standard, making it possible to use decades-old EI reference spectra for annotation 1 . There are ~1.2 million reference spectra that have been accumulated and curated over a period of more than 50 years 2 . Many tools and repositories for GC-MS data have been introduced [3][4][5][6][7][8][9][10][11][12][13][14][15] ; however, much of GC-MS data processing is restricted to vendor-specific formats and software 8 . Currently, deconvolution requires setting multiple parameters manually [3][4][5] or posessing computational skills to run the software 7 . Also, the lack of data sharing in a uniform format precludes data comparison between laboratories and prevents taking advantage of repository-scale information and community knowledge, resulting in infrequent reuse of GC-MS data 8,[11][12][13][14][15] .Although batch modes exist, deconvolution quality is currently not enhanced by using information from all other files. To leverage across-file information, improve scalability of spectral deconvolution and eliminate the need for manually setting the deconvolution parameters (m/z error correction of the ions and peak shapeslopes of raising and trailing edges, peak RT shifts and noise/intensity thresholds), we developed an algorithmic learning strategy for auto-deconvolution (Fig. 1a-f). We deployed this functionality within GNPS/MassIVE (https://gnps.ucsd.edu) 16 (Fig. 1f-i). To promote analysis reproducibility, all GNPS jobs performed are retained in the 'My User' space and can be shared as hyperlinks.This user-independent 'automatic' parameter optimization is accomplished via fast Fourier transform (FFT), multiplication and inverse Fourier transform for each ion across an entire data set, followed by an unsupervised non-negative matrix factorization (NMF) (one-layer neural network). Then, the compositional consistency of spectral patterns for each spec...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.