BackgroundIn spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases.DescriptionHere we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.ConclusionsMINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-015-0087-1) contains supplementary material, which is available to authorized users.
For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org and KBase.
Many common metabolites are intrinsically unstable and reactive, and hence prone to chemical (i.e. non-enzymatic) damage in vivo Although this fact is widely recognized, the purely chemical side-reactions of metabolic intermediates can be surprisingly hard to track down in the literature and are often treated in an unprioritized case-by-case way. Moreover, spontaneous chemical side-reactions tend to be overshadowed today by side-reactions mediated by promiscuous ('sloppy') enzymes even though chemical damage to metabolites may be even more prevalent than damage from enzyme sloppiness, has similar outcomes, and is held in check by similar biochemical repair or pre-emption mechanisms. To address these limitations and imbalances, here we draw together and systematically integrate information from the (bio)chemical literature, from cheminformatics, and from genome-scale metabolic models to objectively define a 'Top 30' list of damage-prone metabolites. A foundational part of this process was to derive general reaction rules for the damage chemistries involved. The criteria for a 'Top 30' metabolite included predicted chemical reactivity, essentiality, and occurrence in diverse organisms. We also explain how the damage chemistry reaction rules ('operators') are implemented in the Chemical-Damage-MINE (CD-MINE) database (minedatabase.mcs.anl.gov/#/top30) to provide a predictive tool for many additional potential metabolite damage products. Lastly, we illustrate how defining a 'Top 30' list can drive genomics-enabled discovery of the enzymes of previously unrecognized damage-control systems, and how applying chemical damage reaction rules can help identify previously unknown peaks in metabolomics profiles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.