Advancements within fecal source tracking (FST) studies
are complicated
by a lack of knowledge regarding the genetic content and distribution
of fecally shed microbial populations. To address this gap, we performed
a systematic literature review and curated a large collection of genomes
(n = 26,018) representing fecally shed prokaryotic species across
broad and narrow source categories commonly implicated in FST studies
of recreational waters (i.e., cats, dogs, cows, seagulls, chickens,
pigs, birds, ruminants, human feces, and wastewater). We find that
across these sources the total number of prokaryotic genomes recovered
from materials meeting our initial inclusion criteria varied substantially
across fecal sources: from none in seagulls to 9,085 in pigs. We examined
genome sequences recovered from these metagenomic and isolation-based
studies extensively via comparative genomic approaches to characterize
trends across source categories and produce a finalized genome database
for each source category which is available online (n = 12,730). On
average, 81% of the genomes representing species-level populations
occur only within a single source. Using fecal slurries to test the
performance of each source database, we report read capture rates
that vary with fecal source alpha diversity and database size. We
expect this resource to be useful to FST-related objectives, One Health
research, and sanitation efforts globally.