2016
DOI: 10.21105/joss.00116
|View full text |Cite
|
Sign up to set email alerts
|

RAILS and Cobbler: Scaffolding and automated finishing of draft genomes using long DNA sequences

Abstract: Despite major advances in DNA sequencing technologies we do not yet have complete genome sequences. Producing high-quality, contiguous, draft assemblies de novo is of paramount importance as it informs on genetic content and organization of the genome (Pagani et al. 2012). The past decade has seen improvements in sequence throughput, a substantially lower DNA sequencing cost and increased read lengths. Whereas the base accuracy of short (currently~250 bp) read lengths such as those from Illumina have improved … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(43 citation statements)
references
References 7 publications
0
43
0
Order By: Relevance
“…We followed a scaffolding and gap-filling methodology similar to that of the recently published bullfrog genome [ 5 ]. Gaps in our initial, supernova draft assembly were filled with Cobbler (version 0.3, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] utilizing contig sequences from the ABySS assemblies generated at three kmer values ( k 95, k 100, k 105). The subsequent gap-filled assembly was initially scaffolded with RAILS (version 1.2, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] using scaffold sequences from the same three ABySS assemblies.…”
Section: Methods Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We followed a scaffolding and gap-filling methodology similar to that of the recently published bullfrog genome [ 5 ]. Gaps in our initial, supernova draft assembly were filled with Cobbler (version 0.3, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] utilizing contig sequences from the ABySS assemblies generated at three kmer values ( k 95, k 100, k 105). The subsequent gap-filled assembly was initially scaffolded with RAILS (version 1.2, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] using scaffold sequences from the same three ABySS assemblies.…”
Section: Methods Results and Discussionmentioning
confidence: 99%
“…Gaps in our initial, supernova draft assembly were filled with Cobbler (version 0.3, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] utilizing contig sequences from the ABySS assemblies generated at three kmer values ( k 95, k 100, k 105). The subsequent gap-filled assembly was initially scaffolded with RAILS (version 1.2, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [ 6 ] using scaffold sequences from the same three ABySS assemblies. Briefly, long sequences are aligned against a draft assembly using BWA-MEM (version 0.7.13), using parameters -a -t 16 [ 7 ], and the resulting alignments are parsed and inspected.…”
Section: Methods Results and Discussionmentioning
confidence: 99%
“…We assessed the improvement to the assembly after each round of scaffolding using the NG50 length metric and the number of complete and partial core eukaryotic genes (CEGs) using CEGMA, which reports a proxy metric for assembly completeness in the genic space (Parra et al 2009). Using the Synthetic Long-Reads (SLR) and the Kollector (Kucuk, et al, in press) targeted gene assembly (TGA) tool, RAILS (Warren 2016) merged over 56 thousand scaffolds; this permitted the recovery of an additional four partial CEGs, and raised the contiguity of the assembly to approximately 30 kbp (Supplemental Table S1). The most dramatic improvements to assembly contiguity and resolved CEGs were obtained using LINKS (Warren et al 2015a) and the MPET reads (NG50 increase of ~16 kbp and 10 additional complete CEGs; Supplemental Table S1), followed by the combined Kollector TGA and the lower-k whole genome assembly (~8 kbp improvement to NG50 and 9 additional complete CEGs; Supplemental Table S1).…”
Section: Resultsmentioning
confidence: 99%
“…The resulting ABySS scaffold assembly (k = 160) was re-scaffolded with RAILS version 0.1 (Warren, 2016) (ftp://ftp.bcgsc.ca/supplementary/RAILS -d 250 -i 0.99) using both Moleculo long reads and Kollector (Kucuk, et al, submitted) targeted gene reconstructions (TGA; Supplemental Table 1). In RAILS, long sequences are aligned against a draft assembly (BWA-MEM V0.7.13-r1126 (Li, 2013) -a -t16), and the 8 alignments are parsed and inspected, tracking the position and orientation of each in assembly draft sequences, satisfying minimum alignment requirements (at minimum 250 anchoring bases with 99% sequence identity or more used in this study).…”
Section: Assembly Processmentioning
confidence: 99%
“…Further scaffolding and gap closure procedures were performed with Rails v1.2/Cobbler v0.3 pipeline script [19] to obtain the final consensus genome sequence named SP_G (ENA accession ID GCA_900499035.1) using the parameters anchoring sequence length ( −d 100) and minimum sequence identity ( −i 0.95). Three scaffolding and gap closure procedures were performed iteratively with one haplotype of the initial assembly as the assembly per se , and previous de novo assemblies from Supernova v1.2.2, (315M/100% and 450M/80% reads/barcodes).…”
Section: Data Descriptionmentioning
confidence: 99%