EnteroBase is an integrated software environment that supports the identification of global population structures within several bacterial genera that include pathogens. Here, we provide an overview of how EnteroBase works, what it can do, and its future prospects. EnteroBase has currently assembled more than 300,000 genomes from Illumina short reads from Salmonella, Escherichia, Yersinia, Clostridioides, Helicobacter, Vibrio, and Moraxella and genotyped those assemblies by core genome multilocus sequence typing (cgMLST). Hierarchical clustering of cgMLST sequence types allows mapping a new bacterial strain to predefined population structures at multiple levels of resolution within a few hours after uploading its short reads. Case Study 1 illustrates this process for local transmissions of Salmonella enterica serovar Agama between neighboring social groups of badgers and humans. EnteroBase also supports single nucleotide polymorphism (SNP) calls from both genomic assemblies and after extraction from metagenomic sequences, as illustrated by Case Study 2 which summarizes the microevolution of Yersinia pestis over the last 5000 years of pandemic plague. EnteroBase can also provide a global overview of the genomic diversity within an entire genus, as illustrated by Case Study 3, which presents a novel, global overview of the population structure of all of the species, subspecies, and clades within Escherichia.
Salmonella enterica subspecies enterica is traditionally subdivided into serovars by serological and nutritional characteristics. We used Multilocus Sequence Typing (MLST) to assign 4,257 isolates from 554 serovars to 1092 sequence types (STs). The majority of the isolates and many STs were grouped into 138 genetically closely related clusters called eBurstGroups (eBGs). Many eBGs correspond to a serovar, for example most Typhimurium are in eBG1 and most Enteritidis are in eBG4, but many eBGs contained more than one serovar. Furthermore, most serovars were polyphyletic and are distributed across multiple unrelated eBGs. Thus, serovar designations confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings. An inability of serotyping to correctly group isolates was most apparent for Paratyphi B and its variant Java. Most Paratyphi B were included within a sub-cluster of STs belonging to eBG5, which also encompasses a separate sub-cluster of Java STs. However, diphasic Java variants were also found in two other eBGs and monophasic Java variants were in four other eBGs or STs, one of which is in subspecies salamae and a second of which includes isolates assigned to Enteritidis, Dublin and monophasic Paratyphi B. Similarly, Choleraesuis was found in eBG6 and is closely related to Paratyphi C, which is in eBG20. However, Choleraesuis var. Decatur consists of isolates from seven other, unrelated eBGs or STs. The serological assignment of these Decatur isolates to Choleraesuis likely reflects lateral gene transfer of flagellar genes between unrelated bacteria plus purifying selection. By confounding multiple evolutionary groups, serotyping can be misleading about the disease potential of S. enterica . Unlike serotyping, MLST recognizes evolutionary groupings and we recommend that Salmonella classification by serotyping should be replaced by MLST or its equivalents.
For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST), core genome MLST (cgMLST), and whole genome MLST (wgMLST) and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs). eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.
The widespread use of antibiotics in association with high-density clinical care has driven the emergence of drugresistant bacteria that are adapted to thrive in hospitalized patients. Of particular concern are globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clones that cause outbreaks and epidemics associated with health care. The most rapidly spreading and tenacious health-care-associated clone in Europe currently is EMRSA-15, which was first detected in the UK in the early 1990s and subsequently spread throughout Europe and beyond. Using phylogenomic methods to analyze the genome sequences for 193 S. aureus isolates, we were able to show that the current pandemic population of EMRSA-15 descends from a health-care-associated MRSA epidemic that spread throughout England in the 1980s, which had itself previously emerged from a primarily community-associated methicillin-sensitive population. The emergence of fluoroquinolone resistance in this EMRSA-15 subclone in the English Midlands during the mid-1980s appears to have played a key role in triggering pandemic spread, and occurred shortly after the first clinical trials of this drug. Genome-based coalescence analysis estimated that the population of this subclone over the last 20 yr has grown four times faster than its progenitor. Using comparative genomic analysis we identified the molecular genetic basis of 99.8% of the antimicrobial resistance phenotypes of the isolates, highlighting the potential of pathogen genome sequencing as a diagnostic tool. We document the genetic changes associated with adaptation to the hospital environment and with increasing drug resistance over time, and how MRSA evolution likely has been influenced by country-specific drug use regimens.
Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm which supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata, and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.