Extending assembly of short DNA sequences to handle error

Jeck, William R.; Reinhardt, Josephine A.; Baltrus, David A.; Hickenbotham, Matthew T.; Magrini, Vincent; Mardis, Elaine R.; Dangl, Jeffery L.; Jones, Corbin D.

doi:10.1093/bioinformatics/btm451

Cited by 220 publications

(129 citation statements)

References 4 publications

Supporting

Mentioning

126

Contrasting

Unclassified

Order By: Relevance

“…40 The success of the recently introduced NGS assemblers is mainly caused by the development of pragmatic engineering and heuristics on assembly algorithms. 39 Some of the tools, such as SSAKE, 41 SHARCGS, 42 VCAKE, 43 and QSRA, 44 work by using greedy graph strategy. Programs applying this algorithm undertake one basic operation: iterative extension (that is, given any read or contig, it will merge with the one with the largest overlap).…”

Section: Assembly Strategiesmentioning

confidence: 99%

“…Programs applying this algorithm undertake one basic operation: iterative extension (that is, given any read or contig, it will merge with the one with the largest overlap). The three programs (SSAKE, VCAKE and QSRA) have been developed to handle imperfectly matching reads, 41,43,44 whereas SHARCGS is widely used on uniform-length, high-coverage and unpaired short reads. QSRA, the most recently developed software in this category, has an advantage in quality-value scores to help users deal with base call errors.…”

Section: Assembly Strategiesmentioning

confidence: 99%

See 1 more Smart Citation

Evaluation of next-generation sequencing software in mapping and assembly

et al. 2011

View full text Add to dashboard Cite

Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.

show abstract

Section: Assembly Strategiesmentioning

confidence: 99%

Section: Assembly Strategiesmentioning

confidence: 99%

Evaluation of next-generation sequencing software in mapping and assembly

et al. 2011

View full text Add to dashboard Cite

show abstract

“…As an alternative, a prefix tree-based approach was introduced by Warren et al (2007) with their early work on SSAKE. This paradigm was also followed in the VCAKE algorithm by Jeck et al (2007), and in the SHARCGS algorithm by Dohm et al (2007). On a third branch, Edena (Hernandez et al 2008) was an adaptation of the traditional overlap-layout-consensus model to short reads.…”

mentioning

confidence: 99%

ABySS: A parallel assembler for short read sequence data

Simpson¹,

Wong²,

Jackman³

et al. 2009

Genome Res.

3,241

2,589

View full text Add to dashboard Cite

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly ByShort Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs ≥100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.

show abstract

“…The overlap-and-extend approach, e.g., SSAKE [6], VCAKE [7] and SHARCGS [8], first determines overlapped reads, i.e., the suffix of a read is the same as the prefix IDBA-MTP Extension of IDBA for assembling prokaryotic metatranscriptomic data. It assembles reads by applying known protein reference sequences.…”

Section: Existing Approachesmentioning

confidence: 99%

Sequence assembly using next generation sequencing data—challenges and solutions

Chin

Leung

Yiu

2014

Sci. China Life Sci.

View full text Add to dashboard Cite

Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets. genomic assembling, de Bruijn graph, paired-end reads, next generation sequencing Citation:Chin FYL, Leung HCM, Yiu SM. Sequence assembly using next generation sequencing data-challenges and solutions.

show abstract

Extending assembly of short DNA sequences to handle error

Abstract: http://152.2.15.114/~labweb/VCAKE

Cited by 220 publications

References 4 publications

Evaluation of next-generation sequencing software in mapping and assembly

Evaluation of next-generation sequencing software in mapping and assembly

ABySS: A parallel assembler for short read sequence data

Sequence assembly using next generation sequencing data—challenges and solutions

Contact Info

Product

Resources

About