23Escherichia coli is a commensal of birds and mammals, including humans. It can act as an 24 opportunistic pathogen and is also found in water and sediments. Since most population 25 studies have focused on clinical isolates, we studied the phylogeny, genetic diversification, 26 and habitat-association of 1,294 isolates representative of the phylogenetic diversity of more 27 than 5,000, mostly non-clinical, isolates originating from humans, poultry, wild animals and 28 water sampled from the Australian continent. These strains represent the species diversity 29 and show large variations in gene repertoires within sequence types. Recent gene transfer is 30 driven by mobile elements and determined by habitat sharing and by phylogroup 31 membership, suggesting that gene flow reinforces the association of certain genetic 32 backgrounds with specific habitats. The phylogroups with smallest genomes had the highest 33 rates of gene repertoire diversification and fewer but more diverse mobile genetic elements, 34 suggesting that smaller genomes are associated with higher, not lower, turnover of genetic 35 information. Many of these small genomes were in freshwater isolates suggesting that some 36 lineages are specifically adapted to this environment. Altogether, these data contribute to 37 explain why epidemiological clones tend to emerge from specific phylogenetic groups in the 38 presence of pervasive horizontal gene transfer across the species. 39 40 circulation of strains and the high plasticity of their genomes have not erased the 68 associations of certain clades with certain isolation sources. In consequence, such 69 associations might reflect local adaptation 16,45 , which would suggest frequent genetic 70 interactions between the novel adaptive changes and the strains' genomic background.
71Understanding how the evolution of gene repertoires is shaped by population structure and 72 habitats requires large-scale comparative genomics of samples with diverse sources of 73 isolation representative of natural populations of E. coli. Most of the efforts of genome 74 sequencing have been devoted to study pathogenic lineages and very few genomic data are 75 available for commensal strains, especially in wild animals, and environmental strains. Here,
76we analysed the genomes of a large collection of E. coli strains collected across many 77 human, domestic and wild animal and environmental sources in different geographic 78 locations from the Australian continent. This collection is dominated by non-clinical isolates, 79 corresponding to the main habitats of the species. We sought to understand the dynamics of 80 the evolution of gene repertoires and how it was driven by mobile genetic elements. The 81 analysis of the isolation sources in the light of phylogenetic structure and genome variation 82 suggests that adaptation varies with the habitat and the phylogenomic background. This 83 contributes to explain why known epidemiological clones of the species emerge from specific 84 phylogenetic groups, even though virulen...