“…As the shorter read length of pyrosequencing data affected the detection of protein localization, a comparison of each 0.1 mm 454 data set with the corresponding Sanger data set was performed. Using the assumption that percentage localizations calculated using PSORT (Gardy et al, 2005) should be the same between the two data sets, a correction factor was computed and applied. Genome size was inferred with GAAS (Angly et al, 2009) and the average number of rRNA operons was estimated by dividing the number of 16S rRNA genes detected in each sample by the average number of proteins assigned to COGs of 32 single-copy genes (COG0012, COG0016, COG0048, COG0049, COG0052, COG0080, COG0081, COG0087, COG0088, COG0090, COG0091, COG0092, COG0093, COG0094, COG0096, COG0097, COG0098, COG0099, COG0100, COG0102, COG0103, COG0124, COG0184, COG0185, COG0186, COG0197, COG0200, COG0201, COG0256, COG0522, COG0533 and COG0541).…”