2014
DOI: 10.1186/1471-2105-15-s9-s4
|View full text |Cite
|
Sign up to set email alerts
|

Near-optimal assembly for shotgun sequencing with noisy reads

Abstract: Recent work identified the fundamental limits on the information requirements in terms of read length and coverage depth required for successful de novo genome reconstruction from shotgun sequencing data, based on the idealistic assumption of no errors in the reads (noiseless reads). In this work, we show that even when there is noise in the reads, one can successfully reconstruct with information requirements close to the noiseless fundamental limit. A new assembly algorithm, X-phased Multibridging, is design… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(28 citation statements)
references
References 18 publications
0
28
0
Order By: Relevance
“…With current sequencing chemistries, PacBio read lengths are ∼16 kb on average but reach ∼50 kb, which can bridge repetitive regions not easily resolved with short read technology. Although PacBio reads have a high error rate (∼15%), because these errors are randomly distributed, several approaches can correct the reads for use in de novo assembly (Koren et al 2012;Ross et al 2013;Chaisson et al 2014;Lam et al 2014). Hybrid approaches use deep coverage from Illumina reads for error correction of the raw PacBio reads (Koren et al 2012).…”
Section: [Supplemental Materials Is Available For This Article]mentioning
confidence: 99%
“…With current sequencing chemistries, PacBio read lengths are ∼16 kb on average but reach ∼50 kb, which can bridge repetitive regions not easily resolved with short read technology. Although PacBio reads have a high error rate (∼15%), because these errors are randomly distributed, several approaches can correct the reads for use in de novo assembly (Koren et al 2012;Ross et al 2013;Chaisson et al 2014;Lam et al 2014). Hybrid approaches use deep coverage from Illumina reads for error correction of the raw PacBio reads (Koren et al 2012).…”
Section: [Supplemental Materials Is Available For This Article]mentioning
confidence: 99%
“…Theoretical analyses of assembly show that error-prone reads are nearly as informative as errorfree reads, suggesting that read accuracy is less important than length 43,44 . Unicycler's performance on the simulated read sets matched these findings.…”
Section: Hybrid Assembly Of Simulated Long and Short Read Datasetsmentioning
confidence: 99%
“…Paired end short reads from different sized longer inserts can improve contiguity, but uncertainty of fragment length and the lack of sequence between the insert ends makes resolving many repetitive structures challenging (5). Longer reads can circumvent this problem, even when such reads exhibit errors rates as high as 20% (58). Importantly, error-prone reads can be corrected, provided there is sufficient coverage and the errors are approximately uniformly distributed.…”
Section: Introductionmentioning
confidence: 99%
“…Despite these successes, shepherding a genome project through the process of DNA isolation, sequencing and assembly is still a challenge, especially for research groups for whom genomes are a means to another goal rather than the goal itself. For example, because high quality genome assembly relies upon long sequencing reads to bridge repetitive genomic regions (6,8,16,17) and high coverage to circumvent read errors (4,7,12), the stringent DNA isolation requirements (size, quantity and purity) for PacBio sequencing (10) intended for genome assembly are higher than those typically employed. Moreover, at present, the low average read quality produced by PacBio sequencing causes coverage requirements to be at least 50-fold (5,13,15).…”
Section: Introductionmentioning
confidence: 99%