SeqSero, launched in 2015, is a software tool for Salmonella serotype determination from whole-genome sequencing (WGS) data. Despite its routine use in public health and food safety laboratories in the United States and other countries, the original SeqSero pipeline is relatively slow (minutes per genome using sequencing reads), is not optimized for draft genome assemblies, and may assign multiple serotypes for a strain. Here, we present SeqSero2 (github.com/denglab/SeqSero2; denglab.info/SeqSero2), an algorithmic transformation and functional update of the original SeqSero. Major improvements include (i) additional sequence markers for identification of Salmonella species and subspecies and certain serotypes, (ii) a k-mer based algorithm for rapid serotype prediction from raw reads (seconds per genome) and improved serotype prediction from assemblies, and (iii) a targeted assembly approach for specific retrieval of serotype determinants from WGS for serotype prediction, new allele discovery, and prediction troubleshooting. Evaluated using 5,794 genomes representing 364 common U.S. serotypes, including 2,280 human isolates of 117 serotypes from the National Antimicrobial Resistance Monitoring System, SeqSero2 is up to 50 times faster than the original SeqSero while maintaining equivalent accuracy for raw reads and substantially improving accuracy for assemblies. SeqSero2 further suggested that 3% of the tested genomes contained reads from multiple serotypes, indicating a use for contamination detection. In addition to short reads, SeqSero2 demonstrated potential for accurate and rapid serotype prediction directly from long nanopore reads despite base call errors. Testing of 40 nanopore-sequenced genomes of 17 serotypes yielded a single H antigen misidentification. IMPORTANCE Serotyping is the basis of public health surveillance of Salmonella. It remains a first-line subtyping method even as surveillance continues to be transformed by whole-genome sequencing. SeqSero allows the integration of Salmonella serotyping into a whole-genome-sequencing-based laboratory workflow while maintaining continuity with the classic serotyping scheme. SeqSero2, informed by extensive testing and application of SeqSero in the United States and other countries, incorporates important improvements and updates that further strengthen its application in routine and large-scale surveillance of Salmonella by whole-genome sequencing.
This multi-agency report developed under the Interagency Collaboration for Genomics for Food and Feed Safety (Gen-FS) provides an overview of the use of and transition to Whole-Genome Sequencing (WGS) technology to detect and characterize pathogens transmitted commonly by food and identify their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among federal agencies, including the National Institutes of Health (NIH); the Department of Health and Human Services’ Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA); and the U.S. Department of Agriculture’s Food Safety and Inspection Service (FSIS), Agricultural Research Service (ARS), and Animal and Plant Health Inspection Service (APHIS). We describe single nucleotide polymorphism (SNP), core-genome (cg) and whole-genome multi-locus sequence typing (wgMLST) data analysis methods as used in CDC’s PulseNet and FDA’s GenomeTrakr networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Gen-FS agency partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles), source attribution efforts, and increasing transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information (NCBI). Finally, we highlight the impact of current trends in the use of culture-independent diagnostics tests (CIDT) for human diagnostic testing on analytical approaches related to food safety. Lastly, we highlight what is next for WGS in food safety.
Species and subspecies within the Salmonella genus have been defined for public health purposes by biochemical properties; however, reference laboratories have increasingly adopted sequence-based, and especially whole genome sequence (WGS), methods for surveillance and routine identification. This leads to potential disparities in subspecies definitions, routine typing, and the ability to detect novel subspecies. A large-scale analysis of WGS data from the routine sequencing of clinical isolates was employed to define and characterise Salmonella subspecies population structure, demonstrating that the Salmonella species and subspecies were genetically distinct, including those previously identified through phylogenetic approaches, namely: S. enterica subspecies londinensis (VII), subspecies brasiliensis (VIII), subspecies hibernicus (IX) and subspecies essexiensis (X). The analysis also identified an additional novel subspecies, reptilium (XI). Further, these analyses indicated that S. enterica subspecies arizonae (IIIa) isolates were divergent from the other S. enterica subspecies, which clustered together and, on the basis of ANI analysis, subspecies IIIa was sufficiently distinct to be classified as a separate species, S. arizonae . Multiple phylogenetic and statistical approaches generated congruent results, suggesting that the proposed species and subspecies structure was sufficiently biologically robust for routine application. Biochemical analyses demonstrated that not all subspecies were distinguishable by these means and that biochemical approaches did not capture the genomic diversity of the genus. We recommend the adoption of standardised genomic definitions of species and subspecies and a genome sequence-based approach to routine typing for the identification and definition of novel subspecies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.