2019
DOI: 10.1038/s41587-019-0072-8
|View full text |Cite
|
Sign up to set email alerts
|

Assembly of long, error-prone reads using repeat graphs

Abstract: The problem of genome assembly is ultimately linked to the problem of the characterization of all repeat families in a genome as a repeat graph. The key reason the de Bruijn graph emerged as a popular short read assembly approach is because it offered an elegant representation of all repeats in a genome that reveals their mosaic structure. However, most algorithms for assembling long error-prone reads use an alternative overlap-layout-consensus (OLC) approach that does not provide a repeat characterization. We… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

7
3,006
0
3

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 3,682 publications
(3,016 citation statements)
references
References 42 publications
7
3,006
0
3
Order By: Relevance
“…A total of 47 Gbp (~99x coverage) of sequencing data was generated on the Pacific Biosystems Sequel sequencing machine. Flye version 2.3.3 (Kolmogorov et al 2019) was run on the PacBio sequencing data specifying a genome size of 700 Mbp (which is between 0.5x and 2.0x of expected genome size of related marine fishes; http://www.genomesize.com/) and otherwise default options. We ran BUSCO version 3.0.1 to assess genome assembly completeness (Simão et al 2015).…”
Section: Genome Sequencing and Draft Assemblymentioning
confidence: 99%
“…A total of 47 Gbp (~99x coverage) of sequencing data was generated on the Pacific Biosystems Sequel sequencing machine. Flye version 2.3.3 (Kolmogorov et al 2019) was run on the PacBio sequencing data specifying a genome size of 700 Mbp (which is between 0.5x and 2.0x of expected genome size of related marine fishes; http://www.genomesize.com/) and otherwise default options. We ran BUSCO version 3.0.1 to assess genome assembly completeness (Simão et al 2015).…”
Section: Genome Sequencing and Draft Assemblymentioning
confidence: 99%
“…canariae NCTC 14382 T was previously sequenced by an Illumina HiSeq 2500 at Public Health England using the Nextera XP library preparation kit following a retrospective study on yersiniosis isolates cultured from patients between April 2004 and March 2018 (8 For ONT MinION data, the run metrics were inspected using NanoPlot (version 1.0) (14) before raw FAST5 files were base-called using Guppy (version 3.2.2) with the high accuracy model to FASTQ files. Adapters were trimmed from the raw reads by Porechop (version 0.2.4) using default parameters for SQK-RAD004 before the genome was de novo assembled with Flye (version 2.5) (15,16). The best assembly parameters were empirically determined to include the option flags "meta" and "plasmid" with coverage reduced to 30X for initial contig assembly based on a predicted genome size of~4.73 Mbp as informed by de novo assembly of short read Illumina data (17).…”
Section: Genome Featuresmentioning
confidence: 99%
“…minimizers [18], 15 homopolymers compressed k-mers [14], minhash [17] etc.). The reduced long-read representation 16is appropriate for detecting overlaps >2kb in a fast way [14,16,17]. The newest long-read assem-17 blers are therefore starting to be good also at goal 3 [14,16,17].…”
mentioning
confidence: 99%
“…The reduced long-read representation 16is appropriate for detecting overlaps >2kb in a fast way [14,16,17]. The newest long-read assem-17 blers are therefore starting to be good also at goal 3 [14,16,17]. However, assembling uncorrected 18 long-reads has the undesirable effect of giving more work to the consensus polisher [15,17,[19][20][21].…”
mentioning
confidence: 99%
See 1 more Smart Citation