We conducted a global characterization of the microbial communities of shipping ports to serve as a novel system to investigate microbial biogeography. The community structures of port microbes from marine and freshwater habitats house relatively similar phyla, despite spanning large spatial scales. As part of this project, we collected 1,218 surface water samples from 604 locations across eight countries and three continents to catalogue a total of 20 shipping ports distributed across the East and West Coast of the United States, Europe, and Asia to represent the largest study of port-associated microbial communities to date. Here, we demonstrated the utility of machine learning to leverage this robust system to characterize microbial biogeography by identifying trends in biodiversity across broad spatial scales. We found that for geographic locations sharing similar environmental conditions, subpopulations from the dominant phyla of these habitats (Actinobacteria, Bacteroidetes, Cyanobacteria, and Proteobacteria) can be used to differentiate 20 geographic locations distributed globally. These results suggest that despite the overwhelming diversity within microbial communities, members of the most abundant and ubiquitous microbial groups in the system can be used to differentiate a geospatial location across global spatial scales. Our study provides insight into how microbes are dispersed spatially and robust methods whereby we can interrogate microbial biogeography.
IMPORTANCE Microbes are ubiquitous throughout the world and are highly diverse. Characterizing the extent of variation in the microbial diversity across large geographic spatial scales is a challenge yet can reveal a lot about what biogeography can tell us about microbial populations and their behavior. Machine learning approaches have been used mostly to examine the human microbiome and, to some extent, microbial communities from the environment. Here, we display how supervised machine learning approaches can be useful to understand microbial biodiversity and biogeography using microbes from globally distributed shipping ports. Our findings indicate that the members of globally dominant phyla are important for differentiating locations, which reduces the reliance on rare taxa to probe geography. Further, this study displays how global biogeographic patterning of aquatic microbial communities (and other systems) can be assessed through populations of the highly abundant and ubiquitous taxa that dominant the system.