Metabolomics is distinct from conventional metabolism studies in that it addresses whole cellular activities rather than just focusing on enzymes, reactions, or metabolites. Metabolomics research currently confronts a problem associated with high-throughput data acquisition technologies including mass spectrometry, which have facilitated simultaneous detection and quantification of large variety of metabolite-derivative peaks without appropriate assignment of metabolites (Hall 2006). To assign the metabolites to peaks of spectra, we need to survey natural products reported in the literatures, which is a very daunting data collection process. So to feasibly incorporate peak information to metabolite, we have developed a metabolite database concerning speciesmetabolite relations called KNApSAcK (Sinbo et al. 2004), which currently contains 49,165 speciesmetabolite relations involving 24,847 metabolites. There are at least three publicly available databases concerning natural products, PubChem (Wheeler et al. 2006), KEGG (Kanehisa et al. 2008), and KNApSAcK (Shinbo et al. 2006). The PubChem database is comprised of records for over 19.6 million compounds with over 11 million unique structures including small molecules, particularly diagnostic and therapeutic agents, but is inconvenient for the purpose of assigning metabolites to spectral peaks, because there is no information on the origin of compounds such as they are synthetic or natural compounds. In KEGG, the metabolic pathways are constructed by interspecies gene relations such as orthologs and paralogs, so metabolite-species relations can be obtained via information of enzymes. However, the KEGG database mainly focuses on metabolites related to known metabolic pathways and includes around 13,000 metabolites. On the other hand, the relationships between metabolites and their biological origins have been addressed systematically in the KNApSAcK database. So KNApSAcK database makes it possible to assign metabolites to spectral peaks tractably.In the present study, we review the current status of KNApSAcK database and it's application to Abstract Since 2004, we have been developing a metabolite database concerning species-metabolite relations called KNApSAcK, which currently contains 49,165 species-metabolite relations incorporating 24,847 metabolites. In the present study, we report current status of KNApSAcK database and it's application to metabolomics fields and propose a new algorithm for detecting fragmentation patterns in a complicated mixture such as a plant tissue and a new scheme for analyzing spectral information leading to peak annotation of GC-MS spectra. When considering samples corresponding to a variety of species in addition to model species, KNApSAcK DB has strong potential for contribution to metabolomics research by way of applying it not only to simple metabolite search but also to further metabolomics analysis.
An approach to peak detection in GC-MS chromatograms and application of KNApSAcK database in prediction of candidate metabolites