The genome of Bordetella pertussis is complex, with high G+C content and many repeats, each longer than 1000 bp. Long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections, with the potential to reveal genomic features which were previously unobservable in multi-contig assemblies produced by short-read sequencing alone. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore user community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. This pipeline produced closed genome sequences for four strains, allowing visualization of inter-strain genomic rearrangement. Read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (almost 200 kbp), which was not resolved by our pipeline; further investigation also revealed that a second strain that was seemingly resolved by our pipeline may contain an even longer duplication, albeit in a small subset of cells. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterization, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.
The genome of Bordetella pertussis is complex, with high GC content and many repeats, each longer than 1,000 bp. Short-read DNA sequencing is unable to resolve the structure of the genome; however, long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore-user-community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. Our best long-read-only assemblies were produced by Canu read correction followed by assembly with Flye and polishing with Nanopolish, whilst the best hybrids (using nanopore and Illumina reads together) were produced by Canu correction followed by Unicycler. This pipeline produced closed genome sequences for four strains, revealing inter-strain genomic rearrangement. However, read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (over 100 kbp), which was not resolved by our pipeline. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterisation, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.
The evolution of Bordetella pertussis from a common ancestor similar to Bordetella bronchiseptica has occurred through large-scale gene loss, inactivation and rearrangements, largely driven by the spread of insertion sequence element repeats throughout the genome. B. pertussis is widely considered to be monomorphic, and recent evolution of the B. pertussis genome appears to, at least in part, be driven by vaccine-based selection. Given the recent global resurgence of whooping cough despite the wide-spread use of vaccination, a more thorough understanding of B. pertussis genomics could be highly informative. In this chapter we discuss the evolution of B. pertussis, including how vaccination is changing the circulating B. pertussis population at the gene-level, and how new sequencing technologies are revealing previously unknown levels of inter-and intra-strain variation at the genome-level.
Bacterial genetic diversity is often described using solely base pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplications of genomic loci are thought to be widespread among bacteria but due to their often intractable size and instability, comprehensive studies of the range and genome dynamics of these mutations are rare. We define a methodology to investigate duplications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number. We demonstrate the approach with Bordetella pertussis, whose insertion sequence element-rich genome provides extensive scope for duplications to occur. Analysis of genome sequence data for 2430 B. pertussis isolates identified 272 putative duplications, of which 94% were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of duplications, suggesting unstable and sporadic characteristics. Genome instability was further described in-vitro using long read sequencing via the Nanopore platform. Clonally derived laboratory cultures produced heterogenous populations containing multiple structural variants. Short read data was used to predict 272 duplications, whilst long reads generated on the Nanopore platform enabled the in-depth study of the genome dynamics of tandem duplications in B. pertussis. Our work reveals the unrecognised and dynamic genetic diversity of B. pertussis and, as the complexity of the B. pertussis genome is not unique, highlights the need for a holistic and fundamental understanding of bacterial genetics.
Bacterial genetic diversity is often described solely using base-pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplication/amplifications are thought to be widespread among bacteria but due to their often-intractable size and instability, comprehensive studies of these mutations are rare. We define a methodology to investigate amplifications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number. We demonstrate the approach with Bordetella pertussis , whose insertion sequence element-rich genome provides extensive scope for amplifications to occur. Analysis of data for 2430 B. pertussis isolates identified 272 putative amplifications, of which 94 % were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of amplifications, suggesting unstable and sporadic characteristics. Genome instability was further described in vitro using long-read sequencing via the Nanopore platform, which revealed that clonally derived laboratory cultures produced heterogenous populations rapidly. We extended this research to analyse a population of 1000 isolates of another important pathogen, Mycobacterium tuberculosis . We found 590 amplifications in M. tuberculosis , and like B. pertussis , these occurred primarily at hotspots. Genes amplified in B. pertussis include those involved in motility and respiration, whilst in M. tuberuclosis, functions included intracellular growth and regulation of virulence. Using publicly available short-read data we predicted previously unrecognized, large amplifications in B. pertussis and M. tuberculosis . This reveals the unrecognized and dynamic genetic diversity of B. pertussis and M. tuberculosis , highlighting the need for a more holistic understanding of bacterial genetics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.