MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data.
We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.
A database (DB) describing the relationships between species and their metabolites would be useful for metabolomics research, because it targets systematic analysis of enormous numbers of organic compounds with known or unknown structures in metabolomics. We constructed an extensive species-metabolite DB for plants, the KNApSAcK Core DB, which contains 101,500 species-metabolite relationships encompassing 20,741 species and 50,048 metabolites. We also developed a search engine within the KNApSAcK Core DB for use in metabolomics research, making it possible to search for metabolites based on an accurate mass, molecular formula, metabolite name or mass spectra in several ionization modes. We also have developed databases for retrieving metabolites related to plants used for a range of purposes. In our multifaceted plant usage DB, medicinal/edible plants are related to the geographic zones (GZs) where the plants are used, their biological activities, and formulae of Japanese and Indonesian traditional medicines (Kampo and Jamu, respectively). These data are connected to the species-metabolites relationship DB within the KNApSAcK Core DB, keyed via the species names. All databases can be accessed via the website http://kanaya.naist.jp/KNApSAcK_Family/. KNApSAcK WorldMap DB comprises 41,548 GZ-plant pair entries, including 222 GZs and 15,240 medicinal/edible plants. The KAMPO DB consists of 336 formulae encompassing 278 medicinal plants; the JAMU DB consists of 5,310 formulae encompassing 550 medicinal plants. The Biological Activity DB consists of 2,418 biological activities and 33,706 pairwise relationships between medicinal plants and their biological activities. Current statistics of the binary relationships between individual databases were characterized by the degree distribution analysis, leading to a prediction of at least 1,060,000 metabolites within all plants. In the future, the study of metabolomics will need to take this huge number of metabolites into consideration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.