2011
DOI: 10.1101/gr.126953.111
|View full text |Cite
|
Sign up to set email alerts
|

Efficient de novo assembly of large genomes using compressed data structures

Abstract: De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
622
0
3

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 693 publications
(626 citation statements)
references
References 29 publications
1
622
0
3
Order By: Relevance
“…Illumina reads were additionally used to estimate the genome size of the taxa. A preqc module (Simpson, 2014) from the SGA de novo genome assembler package (Simpson and Durbin, 2012) was used for this estimation. This utility also enabled an estimation of heterozygosity and repeat content in the genome.…”
Section: Methodsmentioning
confidence: 99%
“…Illumina reads were additionally used to estimate the genome size of the taxa. A preqc module (Simpson, 2014) from the SGA de novo genome assembler package (Simpson and Durbin, 2012) was used for this estimation. This utility also enabled an estimation of heterozygosity and repeat content in the genome.…”
Section: Methodsmentioning
confidence: 99%
“…The trimmed sequences from all 14 samples were used to create a combined metagenome assembly with SGA, version 0.9.18 (Simpson and Durbin, 2012). First, pre-processing of the reads was carried out using the preprocess module of SGA with the -phred64 flag, with settings -p1 for paired-end libraries and -permute-ambiguous to treat ambiguities as all possible combinations of characters.…”
Section: Assemblymentioning
confidence: 99%
“…For strain aware assembly, it is helpful to process reads at their full length, because this increases the power to distinguish low-frequent, co-occurring true mutations from sequencing errors. In this line, there has been recent evidence that shorter genomes can be assembled through overlap graph based approaches, which make use of full-length reads, using short reads (Simpson and Durbin, 2012). It was also shown that one can perform strain aware assembly through iterative construction of overlap graphs (Tö pfer et al, 2014).…”
Section: Introductionmentioning
confidence: 99%