Campylobacter spp. are a leading and increasing cause of gastrointestinal infections worldwide. Source attribution, which apportions human infection cases to different animal species and food reservoirs, has been instrumental in control- and evidence-based intervention efforts. The rapid increase in whole-genome sequencing data provides an opportunity for higher-resolution source attribution models. Important challenges, including the high dimension and complex structure of WGS data, have inspired concerted research efforts to develop new models. We propose network analysis models as an accurate, high-resolution source attribution approach for the sources of human campylobacteriosis. A weighted network analysis approach was used in this study for source attribution comparing different WGS data inputs. The compared model inputs consisted of cgMLST and wgMLST distance matrices from 717 human and 717 animal isolates from cattle, chickens, dogs, ducks, pigs and turkeys. SNP distance matrices from 720 human and 720 animal isolates were also used. The data were collected from 2015 to 2017 in Denmark, with the animal sources consisting of domestic and imports from 7 European countries. Clusters consisted of network nodes representing respective genomes and links representing distances between genomes. Based on the results, animal sources were the main driving factor for cluster formation, followed by type of species and sampling year. The coherence source clustering (CSC) values based on animal sources were 78%, 81% and 78% for cgMLST, wgMLST and SNP, respectively. The CSC values based on Campylobacter species were 78%, 79% and 69% for cgMLST, wgMLST and SNP, respectively. Including human isolates in the network resulted in 88%, 77% and 88% of the total human isolates being clustered with the different animal sources for cgMLST, wgMLST and SNP, respectively. Between 12% and 23% of human isolates were not attributed to any animal source. Most of the human genomes were attributed to chickens from Denmark, with an average attribution percentage of 52.8%, 52.2% and 51.2% for cgMLST, wgMLST and SNP distance matrices respectively, while ducks from Denmark showed the least attribution of 0% for all three distance matrices. The best-performing model was the one using wgMLST distance matrix as input data, which had a CSC value of 81%. Results from our study show that the weighted network-based approach for source attribution is reliable and can be used as an alternative method for source attribution considering the high performance of the model. The model is also robust across the different Campylobacter species, animal sources and WGS data types used as input.
Campylobacter spp. are the most common cause of bacterial gastrointestinal infection in humans both in Denmark and worldwide. Studies have found microbial subtyping to be a powerful tool for source attribution, but comparisons of different methodologies are limited. In this study, we compare three source attribution approaches (Machine Learning, Network Analysis, and Bayesian modeling) using three types of whole genome sequences (WGS) data inputs (cgMLST, 5-Mers and 7-Mers). We predicted and compared the sources of human campylobacteriosis cases in Denmark. Using 7mer as an input feature provided the best model performance. The network analysis algorithm had a CSC value of 78.99% and an F1-score value of 67%, while the machine-learning algorithm showed the highest accuracy (98%). The models attributed between 965 and all of the 1224 human cases to a source (network applying 5mer and machine learning applying 7mer, respectively). Chicken from Denmark was the primary source of human campylobacteriosis with an average percentage probability of attribution of 45.8% to 65.4%, representing Bayesian with 7mer and machine learning with cgMLST, respectively. Our results indicate that the different source attribution methodologies based on WGS have great potential for the surveillance and source tracking of Campylobacter. The results of such models may support decision makers to prioritize and target interventions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.