What can we learn from over 100,000 <i>Escherichia coli</i> genomes?

Abram, Kaleb Z.; Udaondo, Zulema; Bleker, Carissa; Wanchai, Visanu; Wassenaar, Trudy M.; Robeson, Michael S.; Ussery, David W.

doi:10.1101/708131

Cited by 11 publications

(15 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, in Salmonella and Klebsiella genomes, GAP proteins were identified and in a number of Enterobacter species (mainly cloacae ) class C proteins were found. In contrast in the Escherichia coli species, despite being a broad taxonomic group (Abram et al ., 2020), only class B acid phosphatases were identified (Supplementary Tables 2A–C).…”

Section: Resultsmentioning

confidence: 99%

Developing robust protein analysis profiles to identify bacterial acid phosphatases in genomes and metagenomic libraries

Udaondo

Duque

Daddaoua

et al. 2020

Environmental Microbiology

Self Cite

View full text Add to dashboard Cite

Summary Phylogenetic analysis of more than 4000 annotated bacterial acid phosphatases was carried out. Our analysis enabled us to sort these enzymes into the following three types: (1) class B acid phosphatases, which were distantly related to the other types, (2) class C acid phosphatases and (3) generic acid phosphatases (GAP). Although class B phosphatases are found in a limited number of bacterial families, which include known pathogens, class C acid phosphatases and GAP proteins are found in a variety of microbes that inhabit soil, fresh water and marine environments. As part of our analysis, we developed three profiles, named Pfr‐B‐Phos, Pfr‐C‐Phos and Pfr‐GAP, to describe the three groups of acid phosphatases. These sequence‐based profiles were then used to scan genomes and metagenomes to identify a large number of formerly unknown acid phosphatases. A number of proteins in databases annotated as hypothetical proteins were also identified by these profiles as putative acid phosphatases. To validate these in silico results, we cloned genes encoding candidate acid phosphatases from genomic DNA or recovered from metagenomic libraries or genes synthesized in vitro based on protein sequences recovered from metagenomic data. Expression of a number of these genes, followed by enzymatic analysis of the proteins, further confirmed that sequence similarity searches using our profiles could successfully identify previously unknown acid phosphatases.

show abstract

Section: Resultsmentioning

confidence: 99%

Developing robust protein analysis profiles to identify bacterial acid phosphatases in genomes and metagenomic libraries

Udaondo

Duque

Daddaoua

et al. 2020

Environmental Microbiology

Self Cite

View full text Add to dashboard Cite

show abstract

“…Even more, describing the gene content by comparing whole-genomic datasets is a much harder problem, which cannot realistically be provided in a high quality in an automated manner across increasing dataset sizes. Therefore, studies on E. coli in recent years have either been detailed and focused only on a single pathotype [20,[23][24][25] or, when utilizing a very large number of genomes, the analyses were limited in their resolution due to the complexity of extracting the information from such large collections [3,69]. Taken together, the collection presented here represents a detailed, high-quality and accessible dataset that will enable researchers to apply comprehensive comparisons in future investigations on E. coli.…”

Section: Discussionmentioning

confidence: 99%

A comprehensive and high-quality collection of Escherichia coli genomes and their genes

et al. 2021

View full text Add to dashboard Cite

Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.

show abstract

“…The different typing approaches showed that the two isolates were genotypically dissimilar; at core-genome level (cgMLST-analysis) the isolates showed a distance of 2362 alleles to each other. Isolate 803-18 was assigned to phylogenetic group B1, serotype H45, sequence type ST155 and cgMLST-based complex type CT7500; isolate 844-18 was identified as phylogenetic group D, serotype O15:H18, ST69 and CT7508 (Table 1) group B1 is known to mainly comprise environmental and animal isolates, whereas phylogenetic group D is known to include more (urogenital-) pathogenic E. coli [24]. This result seems to be concordant with MLST, since E. coli-ST155 has been described as sequence type with zoonotic potential and plasmid-mediated spread of antibiotic resistance, whereas E. coli-ST69 was described as a pandemic and pathogenic lineage [25,26].…”

Section: Wgs-based Typingmentioning

confidence: 99%

Genome sequences of two clinical Escherichia coli isolates harboring the novel colistin-resistance gene variants mcr-1.26 and mcr-1.27

Neumann

Rackwitz

Hunfeld³

et al. 2020

Gut Pathog

View full text Add to dashboard Cite

Background: Colistin is still a widely used antibiotic in veterinary medicine although it is a last-line treatment option for hospitalized patients with infections caused by multidrug-resistant Gram-negative bacteria. Colistin resistance has gained additional importance since the recent emergence of mobile colistin resistance (mcr) genes. In the scope of a study on colistin resistance in clinical Escherichia coli isolates from human patients in Germany we characterized the mcr-1 gene variants. Results: Our PCR-based screening for mcr-carrying E. coli from German patients revealed the presence of mcr-1-like genes in 60 isolates. Subsequent whole-genome sequence-based analyses detected one non-synonymous mutation in the mcr-1 gene for two isolates. The mutations were verified by Sanger sequencing and resulted in amino acid changes Met1Thr (isolate 803-18) and Tyr9Cys (isolate 844-18). Genotyping revealed no relationship between the isolates. The two clinical isolates were assigned to sequence types ST155 (isolate 803-18) and ST69 (isolate 844-18). Both mcr-1 variants were found to be located on IncX4 plasmids of 33 kb size; these plasmids were successfully conjugated into sodium azide resistant E. coli J53 Azi r in a broth mating experiment. Conclusions: Here we present the draft sequences of E. coli isolate 803-18 carrying the novel variant mcr-1.26 and isolate 844-14 carrying the novel variant mcr-1.27. The results highlight the increasing issue of transferable colistin resistance.

show abstract

What can we learn from over 100,000 Escherichia coli genomes?

Cited by 11 publications

References 39 publications

Developing robust protein analysis profiles to identify bacterial acid phosphatases in genomes and metagenomic libraries

Developing robust protein analysis profiles to identify bacterial acid phosphatases in genomes and metagenomic libraries

A comprehensive and high-quality collection of Escherichia coli genomes and their genes

Genome sequences of two clinical Escherichia coli isolates harboring the novel colistin-resistance gene variants mcr-1.26 and mcr-1.27

Contact Info

Product

Resources

About