2013
DOI: 10.1186/1471-2105-14-s5-s18
|View full text |Cite
|
Sign up to set email alerts
|

Optimal assembly for high throughput shotgun sequencing

Abstract: We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
90
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 88 publications
(91 citation statements)
references
References 27 publications
1
90
0
Order By: Relevance
“…In contrast, the greedy algorithm loads only the "best" (longest) overlaps for each read end into memory. This greedy approach is optimal when the read length is sufficiently long (Bresler et al 2013), and a best overlap graph can be built using just 64 GB of memory for a mammalian genome. However, the greedy algorithm can be misled by repeats that are longer than the overlap length and is therefore prone to mis-assemblies.…”
Section: Best Overlap Graphmentioning
confidence: 99%
“…In contrast, the greedy algorithm loads only the "best" (longest) overlaps for each read end into memory. This greedy approach is optimal when the read length is sufficiently long (Bresler et al 2013), and a best overlap graph can be built using just 64 GB of memory for a mammalian genome. However, the greedy algorithm can be misled by repeats that are longer than the overlap length and is therefore prone to mis-assemblies.…”
Section: Best Overlap Graphmentioning
confidence: 99%
“…Despite these successes, shepherding a genome project through the process of DNA isolation, sequencing and assembly is still a challenge, especially for research groups for whom genomes are a means to another goal rather than the goal itself. For example, because high quality genome assembly relies upon long sequencing reads to bridge repetitive genomic regions (6,8,16,17) and high coverage to circumvent read errors (4,7,12), the stringent DNA isolation requirements (size, quantity and purity) for PacBio sequencing (10) intended for genome assembly are higher than those typically employed. Moreover, at present, the low average read quality produced by PacBio sequencing causes coverage requirements to be at least 50-fold (5,13,15).…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the question of the minimum coverage required may avail itself to information-theoretical bounds and near-optimal solutions, similar to those established for the assembly problem [59,60].…”
Section: Discussionmentioning
confidence: 99%