2016
DOI: 10.1073/pnas.1604560113
|View full text |Cite
|
Sign up to set email alerts
|

Assembly of long error-prone reads using de Bruijn graphs

Abstract: The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
251
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 298 publications
(251 citation statements)
references
References 58 publications
0
251
0
Order By: Relevance
“…The prospect of prediction of antimicrobial susceptibility based on resistome identification holds 58 promises for shortening time from sample to report and informing treatment decisions. Databases with 59 comprehensive reference sequences of high quality are a necessity for these purposes. Bacteroides fragilis 60…”
Section: Impact Statement 53mentioning
confidence: 99%
“…The prospect of prediction of antimicrobial susceptibility based on resistome identification holds 58 promises for shortening time from sample to report and informing treatment decisions. Databases with 59 comprehensive reference sequences of high quality are a necessity for these purposes. Bacteroides fragilis 60…”
Section: Impact Statement 53mentioning
confidence: 99%
“…canariae NCTC 14382 T was previously sequenced by an Illumina HiSeq 2500 at Public Health England using the Nextera XP library preparation kit following a retrospective study on yersiniosis isolates cultured from patients between April 2004 and March 2018 (8 For ONT MinION data, the run metrics were inspected using NanoPlot (version 1.0) (14) before raw FAST5 files were base-called using Guppy (version 3.2.2) with the high accuracy model to FASTQ files. Adapters were trimmed from the raw reads by Porechop (version 0.2.4) using default parameters for SQK-RAD004 before the genome was de novo assembled with Flye (version 2.5) (15,16). The best assembly parameters were empirically determined to include the option flags "meta" and "plasmid" with coverage reduced to 30X for initial contig assembly based on a predicted genome size of~4.73 Mbp as informed by de novo assembly of short read Illumina data (17).…”
Section: Genome Featuresmentioning
confidence: 99%
“…Similarly k-mers that occur more frequently than what would be expected given the sequencing depth and the error rate are likely to come from repetitive regions. It is a common practice to prune the k-mer space using various methodologies (Koren et al, 2017;Lin et al, 2016;Carvalho et al, 2016).…”
Section: Proposed Algorithmmentioning
confidence: 99%
“…Following the terminology proposed by Lin et al (2016), we identify k-mers that do not exist in the genome as non-genomic, thus characterizing k-mers present in the genome as genomic. A genomic k-mer can be repeated, if it is present multiple times in the genome, or unique, if it is not.…”
Section: Reliable K-mersmentioning
confidence: 99%
See 1 more Smart Citation