2019
DOI: 10.1101/708131
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What can we learn from over 100,000 Escherichia coli genomes?

Abstract: The explosion of microbial genome sequences in public databases allows for largescale population studies of model organisms, such as Escherichia coli. We have examined more than one hundred-thousand E. coli and Shigella genomes. After removing outliers, genomes were classified into two broad clusters based on a semi-automated Mash analysis, which distinguished 14 distinct phylotypes, graphically illustrated by Cytoscape. From a set of more than ten-thousand good quality E. coli and Shigella genomes from GenBan… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
14
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(15 citation statements)
references
References 39 publications
1
14
0
Order By: Relevance
“…For example, in Salmonella and Klebsiella genomes, GAP proteins were identified and in a number of Enterobacter species (mainly cloacae ) class C proteins were found. In contrast in the Escherichia coli species, despite being a broad taxonomic group (Abram et al ., 2020), only class B acid phosphatases were identified (Supplementary Tables 2A–C).…”
Section: Resultsmentioning
confidence: 99%
“…For example, in Salmonella and Klebsiella genomes, GAP proteins were identified and in a number of Enterobacter species (mainly cloacae ) class C proteins were found. In contrast in the Escherichia coli species, despite being a broad taxonomic group (Abram et al ., 2020), only class B acid phosphatases were identified (Supplementary Tables 2A–C).…”
Section: Resultsmentioning
confidence: 99%
“…Even more, describing the gene content by comparing whole-genomic datasets is a much harder problem, which cannot realistically be provided in a high quality in an automated manner across increasing dataset sizes. Therefore, studies on E. coli in recent years have either been detailed and focused only on a single pathotype [20,[23][24][25] or, when utilizing a very large number of genomes, the analyses were limited in their resolution due to the complexity of extracting the information from such large collections [3,69]. Taken together, the collection presented here represents a detailed, high-quality and accessible dataset that will enable researchers to apply comprehensive comparisons in future investigations on E. coli.…”
Section: Discussionmentioning
confidence: 99%
“…The different typing approaches showed that the two isolates were genotypically dissimilar; at core-genome level (cgMLST-analysis) the isolates showed a distance of 2362 alleles to each other. Isolate 803-18 was assigned to phylogenetic group B1, serotype H45, sequence type ST155 and cgMLST-based complex type CT7500; isolate 844-18 was identified as phylogenetic group D, serotype O15:H18, ST69 and CT7508 (Table 1) group B1 is known to mainly comprise environmental and animal isolates, whereas phylogenetic group D is known to include more (urogenital-) pathogenic E. coli [24]. This result seems to be concordant with MLST, since E. coli-ST155 has been described as sequence type with zoonotic potential and plasmid-mediated spread of antibiotic resistance, whereas E. coli-ST69 was described as a pandemic and pathogenic lineage [25,26].…”
Section: Wgs-based Typingmentioning
confidence: 99%