Proteus mirabilis
is a Gram-negative bacterium recognized for its unique swarming motility and urease activity. A previous proteomic report on four strains hypothesized that, unlike other Gram-negative bacteria,
P. mirabilis
may not exhibit significant intraspecies variation in gene content. However, there has not been a comprehensive analysis of large numbers of
P. mirabilis
genomes from various sources to support or refute this hypothesis. We performed comparative genomic analysis on 2,060
Proteus
genomes. We sequenced the genomes of 893 isolates recovered from clinical specimens from three large US academic medical centers, combined with 1,006 genomes from NCBI Assembly and 161 genomes assembled from Illumina reads in the public domain. We used average nucleotide identity (ANI) to delineate species and subspecies, core genome phylogenetic analysis to identify clusters of highly related
P. mirabilis
genomes, and pan-genome annotation to identify genes of interest not present in the model
P. mirabilis
strain HI4320. Within our cohort,
Proteus
is composed of 10 named species and 5 uncharacterized genomospecies.
P. mirabilis
can be subdivided into three subspecies; subspecies 1 represented 96.7% (1,822/1,883) of all genomes. The
P. mirabilis
pan-genome includes 15,399 genes outside of HI4320, and 34.3% (5,282/15,399) of these genes have no putative assigned function. Subspecies 1 is composed of several highly related clonal groups. Prophages and gene clusters encoding putatively extracellular-facing proteins are associated with clonal groups. Uncharacterized genes not present in the model strain
P. mirabilis
HI4320 but with homology to known virulence-associated operons can be identified within the pan-genome.
IMPORTANCE
Gram-negative bacteria use a variety of extracellular facing factors to interact with eukaryotic hosts. Due to intraspecies genetic variability, these factors may not be present in the model strain for a given organism, potentially providing incomplete understanding of host-microbial interactions. In contrast to previous reports on
P. mirabilis
, but similar to other Gram-negative bacteria,
P. mirabilis
has a mosaic genome with a linkage between phylogenetic position and accessory genome content.
P. mirabilis
encodes a variety of genes that may impact host-microbe dynamics beyond what is represented in the model strain HI4320. The diverse, whole-genome characterized strain bank from this work can be used in conjunction with reverse genetic and infection models to better understand the impact of accessory genome content on bacterial physiology and pathogenesis of infection.