GapMind is a Web-based tool for annotating amino acid biosynthesis in bacteria and archaea (http://papers.genomics.lbl.gov/gaps). GapMind incorporates many variant pathways and 130 different reactions, and it analyzes a genome in just 15 s. To avoid error-prone transitive annotations, GapMind relies primarily on a database of experimentally characterized proteins. GapMind correctly handles fusion proteins and split proteins, which often cause errors for best-hit approaches. To improve GapMind’s coverage, we examined genetic data from 35 bacteria that grow in defined media without amino acids, and we filled many gaps in amino acid biosynthesis pathways. For example, we identified additional genes for arginine synthesis with succinylated intermediates in Bacteroides thetaiotaomicron, and we propose that Dyella japonica synthesizes tyrosine from phenylalanine. Nevertheless, for many bacteria and archaea that grow in minimal media, genes for some steps still cannot be identified. To help interpret potential gaps, GapMind checks if they match known gaps in related microbes that can grow in minimal media. GapMind should aid the identification of microbial growth requirements.
IMPORTANCE Many microbes can make all of the amino acids (the building blocks of proteins). In principle, we should be able to predict which amino acids a microbe can make, and which it requires as nutrients, by checking its genome sequence for all of the necessary genes. However, in practice, it is difficult to check for all of the alternative pathways. Furthermore, new pathways and enzymes are still being discovered. We built an automated tool, GapMind, to annotate amino acid biosynthesis in bacterial and archaeal genomes. We used GapMind to list gaps: cases where a microbe makes an amino acid but a complete pathway cannot be identified in its genome. We used these gaps, together with data from mutants, to identify new pathways and enzymes. However, for most bacteria and archaea, we still do not know how they can make all of the amino acids.