Efficient de novo assembly of large genomes using compressed data structures

Simpson, Jared T.; Durbin, Richard

doi:10.1101/gr.126953.111

Cited by 693 publications

(626 citation statements)

References 29 publications

Supporting

Mentioning

622

Contrasting

Unclassified

Order By: Relevance

“…Illumina reads were additionally used to estimate the genome size of the taxa. A preqc module (Simpson, 2014) from the SGA de novo genome assembler package (Simpson and Durbin, 2012) was used for this estimation. This utility also enabled an estimation of heterozygosity and repeat content in the genome.…”

Section: Methodsmentioning

confidence: 99%

Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice

Brożyńska

Copetti

Furtado

et al. 2017

Plant Biotechnology Journal

View full text Add to dashboard Cite

SummaryThe related A genome species of the Oryza genus are the effective gene pool for rice. Here, we report draft genomes for two Australian wild A genome taxa: O. rufipogon‐like population, referred to as Taxon A, and O. meridionalis‐like population, referred to as Taxon B. These two taxa were sequenced and assembled by integration of short‐ and long‐read next‐generation sequencing (NGS) data to create a genomic platform for a wider rice gene pool. Here, we report that, despite the distinct chloroplast genome, the nuclear genome of the Australian Taxon A has a sequence that is much closer to that of domesticated rice (O. sativa) than to the other Australian wild populations. Analysis of 4643 genes in the A genome clade showed that the Australian annual, O. meridionalis, and related perennial taxa have the most divergent (around 3 million years) genome sequences relative to domesticated rice. A test for admixture showed possible introgression into the Australian Taxon A (diverged around 1.6 million years ago) especially from the wild indica/O. nivara clade in Asia. These results demonstrate that northern Australia may be the centre of diversity of the A genome Oryza and suggest the possibility that this might also be the centre of origin of this group and represent an important resource for rice improvement.

show abstract

Section: Methodsmentioning

confidence: 99%

Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice

Brożyńska

Copetti

Furtado

et al. 2017

Plant Biotechnology Journal

View full text Add to dashboard Cite

show abstract

“…The trimmed sequences from all 14 samples were used to create a combined metagenome assembly with SGA, version 0.9.18 (Simpson and Durbin, 2012). First, pre-processing of the reads was carried out using the preprocess module of SGA with the -phred64 flag, with settings -p1 for paired-end libraries and -permute-ambiguous to treat ambiguities as all possible combinations of characters.…”

Section: Assemblymentioning

confidence: 99%

Erratum: Divergent functional isoforms drive niche specialisation for nutrient acquisition and use in rumen microbiome

Rubino¹,

Carberry²,

Waters³

et al. 2017

The ISME Journal

View full text Add to dashboard Cite

Many microbes in complex competitive environments share genes for acquiring and utilising nutrients, questioning whether niche specialisation exists and if so, how it is maintained. We investigated the genomic signatures of niche specialisation in the rumen microbiome, a highly competitive, anaerobic environment, with limited nutrient availability determined by the biomass consumed by the host. We generated individual metagenomic libraries from 14 cows fed an ad libitum diet of grass silage and calculated functional isoform diversity for each microbial gene identified. The animal replicates were used to calculate confidence intervals to test for differences in diversity of functional isoforms between microbes that may drive niche specialisation. We identified 153 genes with significant differences in functional isoform diversity between the two most abundant bacterial genera in the rumen (Prevotella and Clostridium). We found Prevotella possesses a more diverse range of isoforms capable of degrading hemicellulose, whereas Clostridium for cellulose. Furthermore, significant differences were observed in key metabolic processes indicating that isoform diversity plays an important role in maintaining their niche specialisation. The methods presented represent a novel approach for untangling complex interactions between microorganisms in natural environments and have resulted in an expanded catalogue of gene targets central to rumen cellulosic biomass degradation.

show abstract

“…For strain aware assembly, it is helpful to process reads at their full length, because this increases the power to distinguish low-frequent, co-occurring true mutations from sequencing errors. In this line, there has been recent evidence that shorter genomes can be assembled through overlap graph based approaches, which make use of full-length reads, using short reads (Simpson and Durbin, 2012). It was also shown that one can perform strain aware assembly through iterative construction of overlap graphs (Tö pfer et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

Snowball: strain aware gene assembly of metagenomes

2016

View full text Add to dashboard Cite

Motivation: Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. Results: We have developed Snowball, a novel strain aware gene assembler for shotgun metagenomic data that does not require closely related reference genomes to be available. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of individual gene domains based on read overlaps and error correction using read quality scores at the same time, which results in very low per-base error rates. Availability and Implementation: The software runs on a user-defined number of processor cores in parallel, runs on a standard laptop and is available under the GPL 3.0 license for installation under Linux or OS X at https://github.com/hzi-bifo/snowball.

show abstract

Efficient de novo assembly of large genomes using compressed data structures

Cited by 693 publications

References 29 publications

Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice

Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice

Erratum: Divergent functional isoforms drive niche specialisation for nutrient acquisition and use in rumen microbiome

Snowball: strain aware gene assembly of metagenomes

Contact Info

Product

Resources

About