Coexisting microbial cells of the same species often exhibit genetic differences that can affect phenotypes ranging from nutrient preference to pathogenicity. Here we present inStrain, a program that utilizes metagenomic paired reads to profile intra-population genetic diversity (microdiversity) across whole genomes and compare populations in a microdiversity-aware manner, dramatically increasing genomic comparison accuracy when benchmarked against existing methods. We use inStrain to profile >1,000 fecal metagenomes from newborn premature infants and find that siblings share significantly more strains than unrelated infants, although identical twins share no more strains than fraternal siblings. Infants born via cesarean section harbored Klebsiella with significantly higher nucleotide diversity than infants delivered vaginally, potentially reflecting acquisition from hospital versus maternal microbiomes. Genomic loci showing diversity within an infant included variants found in other infants, possibly reflecting inoculation from diverse hospital-associated sources. InStrain can be applied to any metagenomic dataset for microdiversity analysis and rigorous strain comparison.
MainCells in microbial populations are not all identical to one another. Genetic polymorphisms rapidly arise through de novo mutation, and these variants can spread because they confer a fitness advantage or by lateral gene transfer (if the variant confers an advantage or is linked to a fitness-conferring variant). It is estimated that billions to trillions of bacterial genetic mutations are generated de novo every day in the microbiome of an individual adult human 1 , and these differences can be clinically relevant. For example, just three point mutations can confer antibiotic resistance in Enterobacteriaceae 2 . Studying genetic variation in microbial populations has historically involved isolating a multitude of cells from the same population and performing phenotypic analysis and/or genome sequencing. Genome-resolved metagenomic analysis, which involves extracting and sequencing DNA directly from the environment and using computational tools to assemble and bin the resulting DNA sequences into genomes in silico , presents an attractive high-throughput alternative to this process. This technique allows simultaneous analysis of microbial communities, the species populations that comprise them, and heterogeneity within these populations, and has been used to reveal fine-scale evolutionary mechanisms 3-5 , dynamics 6-12 , and strain level metabolic variation that could contribute to strain selection 1,13 .Many fundamental questions in human microbiome research relate to the transmission of microbial populations between individuals, including how we are seeded by microbes early in life [14][15][16] . However, strain diversity presents challenges for such analyses. Sequence comparisons are usually performed by aligning consensus genomes assembled from different samples 1,17 or by modifying a reference genome using mapped reads and comparing it...