BackgroundThe fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest (www.casmi-contest.org) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification.ResultsThe Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in “Category 2: Best Automatic Structural Identification—In Silico Fragmentation Only”, won by Team Brouard with 41% challenge wins. The winner of “Category 3: Best Automatic Structural Identification—Full Information” was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways.ConclusionsThe improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for “known unknowns”. As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for “real life” annotations. The true “unknown unknowns” remain to be evaluated in future CASMI contests.Graphical abstract.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-017-0207-1) contains supplementary material, which is available to authorized users.
Urine metabolites are used in many clinical and biomedical studies but usually only for a few classic compounds. Metabolomics detects vastly more metabolic signals that may be used to precisely define the health status of individuals. However, many compounds remain unidentified, hampering biochemical conclusions. Here, we annotate all metabolites detected by two untargeted metabolomic assays, hydrophilic interaction chromatography (HILIC)-Q Exactive HF mass spectrometry and charged surface hybrid (CSH)-Q Exactive HF mass spectrometry. Over 9,000 unique metabolite signals were detected, of which 42% triggered MS/MS fragmentations in data-dependent mode. On the highest Metabolomics Standards Initiative (MSI) confidence level 1, we identified 175 compounds using authentic standards with precursor mass, retention time, and MS/ MS matching. An additional 578 compounds were annotated by precursor accurate mass and MS/MS matching alone, MSI level 2, including a novel library specifically geared at acylcarnitines (CarniBlast). The rest of the metabolome is usually left unannotated. To fill this gap, we used the in silico fragmentation tool CSI:FingerID and the new NIST hybrid search to annotate all further compounds (MSI level 3). Testing the top-ranked metabolites in CSI:Finger ID annotations yielded 40% accuracy when applied to the MSI level 1 identified compounds. We classified all MSI level 3 annotations by the NIST hybrid search using the ClassyFire ontology into 21 superclasses that were further distinguished into 184 chemical classes. ClassyFire annotations showed that the previously unannotated urine metabolome consists of 28% derivatives of organic acids, 16% heterocyclics, and 16% lipids as major classes.
Identification of unknown metabolites is the bottleneck in advancing metabolomics, leaving interpretation of metabolomics results ambiguous. The chemical diversity of metabolism is vast, making structure identification arduous and time consuming. Currently, comprehensive analysis of mass spectra in metabolomics is limited to library matching, but tandem mass spectral libraries are small compared to the large number of compounds found in the biosphere, including xenobiotics. Resolving this bottleneck requires richer data acquisition and better computational tools. Multi-stage mass spectrometry (MSn) trees show promise to aid in this regard. Fragmentation trees explore the fragmentation process, generate fragmentation rules and aid in sub-structure identification, while mass spectral trees delineate the dependencies in multi-stage MS of collision-induced dissociations. This review covers advancements over the past 10 years as a tool for metabolite identification, including algorithms, software and databases used to build and to implement fragmentation trees and mass spectral annotations.
Mouse knockouts facilitate the study ofgene functions. Often, multiple abnormal phenotypes are induced when a gene is inactivated. The International Mouse Phenotyping Consortium (IMPC) has generated thousands of mouse knockouts and catalogued their phenotype data. We have acquired metabolomics data from 220 plasma samples from 30 unique mouse gene knockouts and corresponding wildtype mice from the IMPC. To acquire comprehensive metabolomics data, we have used liquid chromatography (LC) combined with mass spectrometry (MS) for detecting polar and lipophilic compounds in an untargeted approach. We have also used targeted methods to measure bile acids, steroids and oxylipins. In addition, we have used gas chromatography GC-TOFMS for measuring primary metabolites. The metabolomics dataset reports 832 unique structurally identified metabolites from 124 chemical classes as determined by ChemRICH software. The GCMS and LCMS raw data files, intermediate and finalized data matrices, R-Scripts, annotation databases, and extracted ion chromatograms are provided in this data descriptor. The dataset can be used for subsequent studies to link genetic variants with molecular mechanisms and phenotypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.