The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate “hybrid” assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.
Background Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species. Electronic supplementary material The online version of this article (10.1186/s13059-019-1727-y) contains supplementary material, which is available to authorized users.
Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.
Mobile genetic elements (MGEs) that frequently transfer within and between bacterial species play a critical role in bacterial evolution, and often carry key accessory genes that associate with a bacteria’s ability to cause disease. MGEs carrying antimicrobial resistance (AMR) and/or virulence determinants are common in the opportunistic pathogen Klebsiella pneumoniae, which is a leading cause of highly drug-resistant infections in hospitals. Well-characterised virulence determinants in K. pneumoniae include the polyketide synthesis loci ybt and clb (also known as pks), encoding the iron-scavenging siderophore yersiniabactin and genotoxin colibactin, respectively. These loci are located within an MGE called ICEKp, which is the most common virulence-associated MGE of K. pneumoniae, providing a mechanism for these virulence factors to spread within the population. Here we apply population genomics to investigate the prevalence, evolution and mobility of ybt and clb in K. pneumoniae populations through comparative analysis of 2498 whole-genome sequences. The ybt locus was detected in 40 % of K. pneumoniae genomes, particularly amongst those associated with invasive infections. We identified 17 distinct ybt lineages and 3 clb lineages, each associated with one of 14 different structural variants of ICEKp. Comparison with the wider population of the family Enterobacteriaceae revealed occasional ICEKp acquisition by other members. The clb locus was present in 14 % of all K. pneumoniae and 38.4 % of ybt+ genomes. Hundreds of independent ICEKp integration events were detected affecting hundreds of phylogenetically distinct K. pneumoniae lineages, including at least 19 in the globally-disseminated carbapenem-resistant clone CG258. A novel plasmid-encoded form of ybt was also identified, representing a new mechanism for ybt dispersal in K. pneumoniae populations. These data indicate that MGEs carrying ybt and clb circulate freely in the K. pneumoniae population, including among multidrug-resistant strains, and should be considered a target for genomic surveillance along with AMR determinants.
The latent transcription factor Stat3 is activated by gp130, the common receptor for the interleukin (IL)-6 cytokine family and other growth factor and cytokine receptors. Ligand-induced dimerization of gp130 leads to activation of the Stat1, Stat3 and Shp2-Ras-Erk signaling pathways. Here we assess genetically the contribution of exaggerated Stat3 activation to the phenotype of gp130 (Y757F/Y757F) mice, in which a knock-in mutation disrupts the negative feedback mechanism on gp130-dependent Stat signaling. Compared to gp130 (Y757F/Y757F) mice, reduced Stat3 activation in gp130 (Y757F/Y757F) Stat3(+/-) mice increased their lifespan, prevented splenomegaly, normalized exaggerated hepatic acute-phase response and lymphocyte trafficking, and suppressed the growth of spontaneously arising gastric adenomas in young mice. These lesions share histological features of gastric polyps in aging mice with monoallelic null mutations in Smad4, which encodes the common transducer for transforming growth factor (TGF)-beta signaling. Indeed, hyperactivation of Stat3 desensitizes gp130 (Y757F/Y757F) cells to the cytostatic effect of TGF-beta through transcriptional induction of inhibitory Smad7, thereby providing a novel link for cross-talk between Stat and Smad signaling in gastric homeostasis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.