Urine culture and microscopy techniques are used to profile the bacterial species present in urinary tract infections. To gain insight into the urinary flora in infection and health, we analyzed clinical laboratory features and the microbial metagenome of 121 clean-catch urine samples. 16S rDNA gene signatures were successfully obtained for 116 participants, while whole genome shotgun sequencing data was successfully generated for samples from 49 participants. Analysis of these datasets supports the definition of the patterns of infection and colonization/contamination. Although 16S rDNA sequencing was more sensitive, whole genome shotgun sequencing allowed for a more comprehensive and unbiased representation of the microbial flora, including eukarya and viral pathogens, and of bacterial virulence factors. Urine samples positive by whole genome shotgun sequencing contained a plethora of bacterial (median 41 genera/sample), eukarya (median 2 species/sample) and viral sequences (median 3 viruses/sample). Genomic analyses revealed cases of infection with potential pathogens (e.g., Alloscardovia sp, Actinotignum sp, Ureaplasma sp) that are often missed during routine urine culture due to species specific growth requirements. We also observed gender differences in the microbial metagenome. While conventional microbiological methods are inadequate to identify a large diversity of microbial species that are present in urine, genomic approaches appear to comprehensively and quantitatively describe the urinary microbiome.
5
Results
Clinical and laboratory data representation.To support an unbiased assessment of the clinical nature of the specimens, we approached the urine sample laboratory and microbiology data using dimensionality reduction, and K-mer clustering analysis. The PCA representation of the clinical laboratory data is presented in Fig 1. The PCA analysis showed that the first two components (PC1, PC2) explained 65% of the variance in the clinical laboratory dataset. PC1 was driven by the vaginal contamination score (VCO), PC2 was contributed primarily by neutrophil activation and degranulation score (NAD), and secondarily by the erythrocyte and vascular injury score (ERY) and the presence of red blood cells (RBC) and leukocytes (WBC) (Fig 1B). The partitioning around medoids clustering resulted in three Clusters, with 9 individuals in Cluster #1, 63 individuals in Cluster #2, and 49 individuals in Cluster #3 (Fig 1C). From these data, we established a preliminary definition of Cluster #1 as likely representing urine from non-infected individuals, while Clusters #2 and #3 are consistent with separate manifestations of infectious and inflammatory processes of the urinary tract. The performance of 16S rDNA and whole genome shotgun sequencing across clinical laboratory clusters is presented in Table 1. 16S rDNA sequencing. 16S rDNA sequencing was successful for 116 (96%) samples ( Table 1). The median (range) number of genera identified per individual was 38 (6-220). The median (range) number of genera varied acro...