Temperate phage genomes are highly variable mosaic collections of genes that infect a bacterial host, integrate into the host's genome or replicate as low copy number plasmids, and are regulated to switch from the lysogenic to lytic cycles to generate new virions and escape their host. Genomes from most Bacterial phyla contain at least one or more prophages. We updated our PhiSpy algorithm to improve detection of prophages and to provide a web-based framework for PhiSpy. We have used this algorithm to identify 36,488 prophage regions from 11,941 bacterial genomes, including almost 600 prophages with no known homology to any proteins. Transfer RNA genes were abundant in the prophages, many of which alleviate the limits of translation efficiency due to host codon bias and presumably enable phages to surpass the normal capacity of the hosts' translation machinery. We identified integrase genes in 15,765 prophages (43% of the prophages). The integrase was routinely located at either end of the integrated phage genome, and was used to orient and align prophage genomes to reveal their underlying organization. The conserved genome alignments of phages recapitulate early, middle, and late gene order in transcriptional control of phage genes, and demonstrate that gene order, presumably selected by transcription timing and/or coordination among functional modules has been stably conserved throughout phage evolution.
ConclusionsHere we presented an analysis of over 11,000 bacterial genomes from which we identified 36,488 prophages. Many phages appear to be limited by initiation of translation by the host's machinery, and may increase translation rates by carrying their own tRNA genes, effectively increasing the availability of both tRNAs loaded with methionine and of peptide deformylase. We have also demonstrated that phages maintain a highly conserved gene order that suggests phage genome mosaicism is limited to clusters of conserved genes rather than individual genes.