METLIN originated as a database to characterize known metabolites and has since expanded into a technology platform for the identification of known and unknown metabolites and other chemical entities. Through this effort it has become a comprehensive resource containing over 1 million molecules including lipids, amino acids, carbohydrates, toxins, small peptides, and natural products, among other classes. METLIN’s high-resolution tandem mass spectrometry (MS/MS) database, which plays a key role in the identification process, has data generated from both reference standards and their labeled stable isotope analogues, facilitated by METLIN-guided analysis of isotope-labeled microorganisms. The MS/MS data, coupled with the fragment similarity search function, expand the tool’s capabilities into the identification of unknowns. Fragment similarity search is performed independent of the precursor mass, relying solely on the fragment ions to identify similar structures within the database. Stable isotope data also facilitate characterization by coupling the similarity search output with the isotopic m/z shifts. Examples of both are demonstrated here with the characterization of four previously unknown metabolites. METLIN also now features in silico MS/MS data, which has been made possible through the creation of algorithms trained on METLIN’s MS/MS data from both standards and their isotope analogues. With these informatic and experimental data features, METLIN is being designed to address the characterization of known and unknown molecules.
Metabolomics, in which small-molecule metabolites (the metabolome) are identified and quantified, is broadly acknowledged to be the omics discipline that is closest to the phenotype1–3. Although appreciated for its role in biomarker discovery programs, metabolomics can also be used to identify metabolites that could alter a cell’s or an organism’s phenotype. Metabolomics activity screening (MAS) as described here integrates metabolomics data with metabolic pathways and systems biology information, including proteomics and transcriptomics data, to produce a set of endogenous metabolites that can be tested for functionality in altering phenotypes. A growing literature reports the use of metabolites to modulate diverse processes, such as stem cell differentiation, oligodendrocyte maturation, insulin signaling, T-cell survival and macrophage immune responses. This opens up the possibility of identifying and applying metabolites to affect phenotypes. Unlike genes or proteins, metabolites are often readily available, which means that MAS is broadly amenable to high-throughput screening of virtually any biological system.
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70 of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
Metabolite identification is still considered an imposing bottleneck in liquid chromatography mass spectrometry (LC/MS) untargeted metabolomics. The identification workflow usually begins with detecting relevant LC/MS peaks via peak-picking algorithms and retrieving putative identities based on accurate mass searching. However, accurate mass search alone provides poor evidence for metabolite identification. For this reason, computational annotation is used to reveal the underlying metabolites monoisotopic masses, improving putative identification in addition to confirmation with tandem mass spectrometry. This review examines LC/MS data from a computational and analytical perspective, focusing on the occurrence of neutral losses and in-source fragments, to understand the challenges in computational annotation methodologies. Herein, we examine the state-of-the-art strategies for computational annotation including: (i) peak grouping or full scan (MS1) pseudo-spectra extraction, i.e., clustering all mass spectral signals stemming from each metabolite; (ii) annotation using ion adduction and mass distance among ion peaks; (iii) incorporation of biological knowledge such as biotransformations or pathways; (iv) tandem MS data; and (v) metabolite retention time calibration, usually achieved by prediction from molecular descriptors. Advantages and pitfalls of each of these strategies are discussed, as well as expected future trends in computational annotation.
Heme is an essential prosthetic group of numerous proteins and a central signaling molecule in many physiologic processes 1,2. The chemical reactivity of heme requires that a network of intracellular chaperone proteins exist to avert the cytotoxic effects of free heme, but the constituents of such trafficking pathways are unknown 3,4. Heme synthesis is completed in mitochondria, with ferrochelatase (FECH) adding iron to protoporphyrin IX. How this vital but Reprints and permissions information is available at http://www.nature.com/reprints.Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.