Listeria monocytogenes is a ubiquitous bacterium that may cause the foodborne illness listeriosis. Only a small amount of data about the population genetic structure of strains isolated from food is available. This study aimed to provide an accurate view of the L. monocytogenes food strain population in France. From 1999 to 2014, 1,894 L. monocytogenes strains were isolated from food at the French National Reference Laboratory for L. monocytogenes and classified according to the five risk food matrices defined by the European Food Safety Authority (EFSA). A total of 396 strains were selected on the basis of different pulsed-field gel electrophoresis (PFGE) clusters, serotypes, and strain origins and typed by multilocus sequence typing (MLST), and the MLST results were supplemented with MLST data available from Institut Pasteur, representing human and additional food strains from France. The distribution of sequence types (STs) was compared between food and clinical strains on a panel of 675 strains. High congruence between PFGE and MLST was found. Out of 73 PFGE clusters, the two most prevalent corresponded to ST9 and ST121. Using original statistical analysis, we demonstrated that (i) there was not a clear association between ST9 and ST121 and the food matrices, (ii) serotype IIc, ST8, and ST4 were associated with meat products, and (iii) ST13 was associated with dairy products. Of the two major STs, ST121 was the ST that included the fewest clinical strains, which might indicate lower virulence. This observation may be directly relevant for refining risk analysis models for the better management of food safety.IMPORTANCE This study showed a very useful backward compatibility between PFGE and MLST for surveillance. The results enabled better understanding of the population structure of L. monocytogenes strains isolated from food and management of the health risks associated with L. monocytogenes food strains. Moreover, this work provided an accurate view of L. monocytogenes strain populations associated with specific food matrices. We clearly showed that some STs were associated with food matrices, such as meat, meat products, and dairy products. We opened the way to source attribution modeling in order to quantify the relative importance of the main food matrices.
Background/objectives: Whole genome sequencing (WGS) has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic data however can be exploited with many different bioinformatics methods like single nucleotide polymorphism (SNP), core-genome multi locus sequence typing (cgMLST), whole-genome multi locus sequence typing (wgMLST) or multi locus predicted protein sequence typing (MLPPST) on either core-genome (cgMLPPST) or pan-genome (wgMLPPST). Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L. monocytogenes.Methods: The clustering methods were evaluated on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses.Results: The backward comparability between conventional typing methods and genomic methods revealed a near-perfect concordance. The importance of selecting a proper reference when calling SNPs was highlighted, although distances between strains remained identical. The analysis also revealed that the topology of the phylogenetic trees between wgMLST and cgMLST were remarkably similar. The comparison between SNP and cgMLST or SNP and wgMLST approaches showed that the topologies of phylogenic trees were statistically similar with an almost equivalent clustering.Conclusion: Our study revealed high concordance between wgMLST, cgMLST, and SNP approaches which are all suitable for typing of L. monocytogenes. The comparable clustering is an important observation considering that the two approaches have been variously implemented among reference laboratories.
Next‐generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine learning algorithms for predicting the risk/health burden at the population level while inputting large and complex NGS data was explored with Listeria monocytogenes as a case study. Listeria data consisted of a percentage similarity matrix from genome assemblies of 38 and 207 strains of clinical and food origin, respectively. Basic Local Alignment (BLAST) was used to align the assemblies against a database of 136 virulence and stress resistance genes. The outcome variable was frequency of illness, which is the percentage of reported cases associated with each strain. These frequency data were discretized into seven ordinal outcome categories and used for supervised machine learning and model selection from five ensemble algorithms. There was no significant difference in accuracy between the models, and support vector machine with linear kernel was chosen for further inference (accuracy of 89% [95% CI: 68%, 97%]). The virulence genes FAM002725, FAM002728, FAM002729, InlF, InlJ, Inlk, IisY, IisD, IisX, IisH, IisB, lmo2026, and FAM003296 were important predictors of higher frequency of illness. InlF was uniquely truncated in the sequence type 121 strains. Most important risk predictor genes occurred at highest prevalence among strains from ready‐to‐eat, dairy, and composite foods. We foresee that the findings and approaches described offer the potential for rethinking the current approaches in MRA.
Campylobacter spp. are a leading and increasing cause of gastrointestinal infections worldwide. Source attribution, which apportions human infection cases to different animal species and food reservoirs, has been instrumental in control- and evidence-based intervention efforts. The rapid increase in whole-genome sequencing data provides an opportunity for higher-resolution source attribution models. Important challenges, including the high dimension and complex structure of WGS data, have inspired concerted research efforts to develop new models. We propose network analysis models as an accurate, high-resolution source attribution approach for the sources of human campylobacteriosis. A weighted network analysis approach was used in this study for source attribution comparing different WGS data inputs. The compared model inputs consisted of cgMLST and wgMLST distance matrices from 717 human and 717 animal isolates from cattle, chickens, dogs, ducks, pigs and turkeys. SNP distance matrices from 720 human and 720 animal isolates were also used. The data were collected from 2015 to 2017 in Denmark, with the animal sources consisting of domestic and imports from 7 European countries. Clusters consisted of network nodes representing respective genomes and links representing distances between genomes. Based on the results, animal sources were the main driving factor for cluster formation, followed by type of species and sampling year. The coherence source clustering (CSC) values based on animal sources were 78%, 81% and 78% for cgMLST, wgMLST and SNP, respectively. The CSC values based on Campylobacter species were 78%, 79% and 69% for cgMLST, wgMLST and SNP, respectively. Including human isolates in the network resulted in 88%, 77% and 88% of the total human isolates being clustered with the different animal sources for cgMLST, wgMLST and SNP, respectively. Between 12% and 23% of human isolates were not attributed to any animal source. Most of the human genomes were attributed to chickens from Denmark, with an average attribution percentage of 52.8%, 52.2% and 51.2% for cgMLST, wgMLST and SNP distance matrices respectively, while ducks from Denmark showed the least attribution of 0% for all three distance matrices. The best-performing model was the one using wgMLST distance matrix as input data, which had a CSC value of 81%. Results from our study show that the weighted network-based approach for source attribution is reliable and can be used as an alternative method for source attribution considering the high performance of the model. The model is also robust across the different Campylobacter species, animal sources and WGS data types used as input.
Campylobacter spp. are the most common cause of bacterial gastrointestinal infection in humans both in Denmark and worldwide. Studies have found microbial subtyping to be a powerful tool for source attribution, but comparisons of different methodologies are limited. In this study, we compare three source attribution approaches (Machine Learning, Network Analysis, and Bayesian modeling) using three types of whole genome sequences (WGS) data inputs (cgMLST, 5-Mers and 7-Mers). We predicted and compared the sources of human campylobacteriosis cases in Denmark. Using 7mer as an input feature provided the best model performance. The network analysis algorithm had a CSC value of 78.99% and an F1-score value of 67%, while the machine-learning algorithm showed the highest accuracy (98%). The models attributed between 965 and all of the 1224 human cases to a source (network applying 5mer and machine learning applying 7mer, respectively). Chicken from Denmark was the primary source of human campylobacteriosis with an average percentage probability of attribution of 45.8% to 65.4%, representing Bayesian with 7mer and machine learning with cgMLST, respectively. Our results indicate that the different source attribution methodologies based on WGS have great potential for the surveillance and source tracking of Campylobacter. The results of such models may support decision makers to prioritize and target interventions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.