The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.
BackgroundExtraintestinal pathogenic Escherichia coli (ExPEC) strains represent a huge public health burden. Knowledge of their clonal diversity and of the association of clones with genomic content and clinical features is a prerequisite to recognize strains with a high invasive potential. In order to provide an unbiased view of the diversity of E. coli strains responsible for bacteremia, we studied 161 consecutive isolates from patients with positive blood culture obtained during one year in two French university hospitals. We collected precise clinical information, multilocus sequence typing (MLST) data and virulence gene content for all isolates. A subset representative of the clonal diversity was subjected to comparative genomic hybridization (CGH) using 2,324 amplicons from the flexible gene pool of E. coli.ResultsRecombination-insensitive phylogenetic analysis of MLST data in combination with the ECOR collection revealed that bacteremic E. coli isolates were highly diverse and distributed into five major lineages, corresponding to the classical E. coli phylogroups (A+B1, B2, D and E) and group F, which comprises strains previously assigned to D. Compared to other strains of phylogenetic group B2, strains belonging to MLST-derived clonal complexes (CCs) CC1 and CC4 were associated (P < 0.05) with a urinary origin. In contrast, no CC appeared associated with severe sepsis or unfavorable outcome of the bacteremia. CGH analysis revealed genomic characteristics of the distinct CCs and identified genomic regions associated with CC1 and/or CC4.ConclusionOur results demonstrate that human bacteremia strains distribute over the entire span of E. coli phylogenetic diversity and that CCs represent important phylogenetic units for pathogenesis and comparative genomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.