Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence-based bGWAS methods identify genomic mutations that occur independently multiple times on the phylogenetic tree in the presence of phenotypic variation more often than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence-based bGWAS. Hogwash additionally contains two burden testing approaches to perform gene or pathway analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases, we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub.
In a case–control study of patients with Clostridium difficile infection, we found no statistically significant association between the presence of trehalose utilization variants in infecting C. difficile strains and development of severe infection outcome. These results do not support trehalose utilization conferring enhanced virulence in the context of human C. difficile infections.
While variant identification pipelines are becoming increasingly standardized, less attention has been paid to the pre-processing of variants prior to their use in bacterial genome-wide association studies (bGWAS). Three nuances of variant pre-processing that impact downstream identification of genetic associations include the separation of variants at multiallelic sites, separation of variants in overlapping genes, and referencing of variants relative to ancestral alleles. Here we demonstrate the importance of these variant pre-processing steps on diverse bacterial genomic datasets and present prewas, an R package, that standardizes the pre-processing of multiallelic sites, overlapping genes, and reference alleles before bGWAS. This package facilitates improved reproducibility and interpretability of bGWAS results. prewas enables users to extract maximal information from bGWAS by implementing multi-line representation for multiallelic sites and variants in overlapping genes. prewas outputs a binary SNP matrix that can be used for SNP-based bGWAS and will prevent the masking of minor alleles during bGWAS analysis. The optional binary gene matrix output can be used for gene-based bGWAS, which will enable users to maximize the power and evolutionary interpretability of their bGWAS studies. prewas is available for download from GitHub.
Clostridioides difficile
has two major disease-mediating toxins, A and B, encoded within the pathogenicity locus (PaLoc). In this study, we demonstrate via multiple approaches that genomic variants outside the PaLoc are associated with changes in cytotoxicity.
Clinical disease from Clostridioides difficile infection can be mediated by two toxins and their neighboring regulatory genes encoded within the five-gene pathogenicity locus (PaLoc). We provide several lines of evidence that the toxin activity of C. difficile may be modulated by genomic variants outside of the PaLoc. We used a phylogenetic tree-based approach to demonstrate discordance between toxin activity and PaLoc evolutionary history, an elastic net method to show the insufficiency of PaLoc variants alone to model toxin activity, and a convergence-based bacterial genome-wide association study (GWAS) to identify correlations between non-PaLoc loci with changes in toxin activity. Combined, these data support a model of C. difficile disease wherein toxin activity may be strongly affected by many non-PaLoc loci. Additionally, we characterize multiple other in vitro phenotypes relevant to human infections including germination and sporulation. These phenotypes vary greatly in their clonality, variability, convergence, and concordance with genomic variation. Lastly, we highlight the intersection of loci identified by GWAS for different phenotypes and clinical severity. This strategy to identify the overlapping loci can facilitate the identification of genetic variation linking phenotypic variation to clinical outcomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.