Cryptococcus gattii recently emerged as the causative agent of cryptococcosis in healthy individuals in western North America, despite previous characterization of the fungus as a pathogen in tropical or subtropical regions. As a foundation to study the genetics of virulence in this pathogen, we sequenced the genomes of a strain (WM276) representing the predominant global molecular type (VGI) and a clinical strain (R265) of the major genotype (VGIIa) causing disease in North America. We compared these C. gattii genomes with each other and with the genomes of representative strains of the two varieties of Cryptococcus neoformans that generally cause disease in immunocompromised people. Our comparisons included chromosome alignments, analysis of gene content and gene family evolution, and comparative genome hybridization (CGH). These studies revealed that the genomes of the two representative C. gattii strains (genotypes VGI and VGIIa) are colinear for the majority of chromosomes, with some minor rearrangements. However, multiortholog phylogenetic analysis and an evaluation of gene/sequence conservation support the existence of speciation within the C. gattii complex. More extensive chromosome rearrangements were observed upon comparison of the C. gattii and the C. neoformans genomes. Finally, CGH revealed considerable variation in clinical and environmental isolates as well as changes in chromosome copy numbers in C. gattii isolates displaying fluconazole heteroresistance.IMPORTANCE Isolates of Cryptococcus gattii are currently causing an outbreak of cryptococcosis in western North America, and most of the cases occurred in the absence of coinfection with HIV. This pattern is therefore in stark contrast to the current global burden of one million annual cases of cryptococcosis, caused by the related species Cryptococcus neoformans, in the HIV/AIDS population. The genome sequences of two outbreak-associated major genotypes of C. gattii reported here provide insights into genome variation within and between cryptococcal species. These sequences also provide a resource to further evaluate the epidemiology of cryptococcal disease and to evaluate the role of pathogen genes in the differential interactions of C. gattii and C. neoformans with immunocompromised and immunocompetent hosts.
MotivationSequencing of human genomes is now routine, and assembly of shotgun reads is increasingly feasible. However, assemblies often fail to inform about chromosome-scale structure due to a lack of linkage information over long stretches of DNA—a shortcoming that is being addressed by new sequencing protocols, such as the GemCode and Chromium linked reads from 10 × Genomics.ResultsHere, we present ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS H.sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts.Availability and implementation https://github.com/bcgsc/ARCS/ Supplementary information Supplementary data are available at Bioinformatics online.
Motivation In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. Results We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (<20×), fixing the majority (>97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in <14 s and <3 m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20× coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30–40 m on those sequences. We show how ntEdit ran in <2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54×) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in <4 and <5 h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. Availability and implementation https://github.com/bcgsc/ntedit Supplementary information Supplementary data are available at Bioinformatics online.
Despite major advances in DNA sequencing technologies we do not yet have complete genome sequences. Producing high-quality, contiguous, draft assemblies de novo is of paramount importance as it informs on genetic content and organization of the genome (Pagani et al. 2012). The past decade has seen improvements in sequence throughput, a substantially lower DNA sequencing cost and increased read lengths. Whereas the base accuracy of short (currently~250 bp) read lengths such as those from Illumina have improved (>99%), the base accuracy of long sequence read platforms (Pacific Biosciences, Oxford Nanopore) remains low for generating reference-grade genome assemblies without read error correction. Gap-filling tools designed to help finish draft genomes in an automated fashion, which includes our own (Paulino et al. 2015), have been recently developed (Tsai, Otto, and Berriman 2010, Boetzer and Pirovano (2012)). They are typically designed to work with short sequencing reads, not high-quality long sequences from other draft assemblies. In many such projects that employ short sequence reads for de novo assembly, a k-mer graph assembly approach is often favored, as it effectively discards errors and spurious sequences, albeit at the cost of long-range information loss and limited ability to resolve long repeats. However, researchers routinely produce various assembly drafts varying the parameter k length in search of the most contiguous assembly. This multitude of assembly drafts is comprised of sequences with untapped potential, representing a wealth of information for gap-filling and scaffolding. Here, I make available two bioinformatics software tools, Cobbler and RAILS (Rene L Warren 2016) to exploit this information for automated finishing and scaffolding with long DNA sequences, respectively. They can be used to scaffold & finish high-quality draft genome assemblies with any long, preferably high-quality, sequences such as scaftigs/contigs from another genome draft. They both rely on accurate, long DNA sequences to patch gaps in existing genome assembly drafts. More specifically, Cobbler is a utility to automatically patch gaps (ambiguous regions in a draft assembly, represented by N's). It does so by first aligning the long sequences to the assembly, tallying the alignments and replacing N's with the sequences from these long DNA sequences. RAILS is an all-in-one scaffolder and gap-filler. Its process is similar to that of Cobbler. It scaffolds a given genome draft with the help of long DNA sequences (contig sequences are ordered/oriented using alignment information) using the scaffolding engine I originally developed for SSAKE (René L. Warren et al. 2007) and LINKS (Warren et al. 2015). The newly created gaps are automatically filled with the DNA string of the provided long DNA sequences. In a simulated long sequences experiment (1, 2.5, 5, 15 kbp sequences) designed from the human genome reference, Cobbler closed >65% of gaps in a human genome assembly draft (Table 1; test provided with the distribution, corre...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.