2018
DOI: 10.1534/g3.118.200162
|View full text |Cite
|
Sign up to set email alerts
|

Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing

Abstract: Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila m… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
112
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 94 publications
(117 citation statements)
references
References 62 publications
5
112
0
Order By: Relevance
“…We showed that Canu yields high-quality genomes from the perspective of TE biology: many piRNA clusters are contiguously assembled, and the TE landscape is accurately captured. The high quality of assemblies generated by Canu was also noted in several previous works (Jayakumar and Sakakibara, 2017;de Lannoy et al, 2017;Solares et al, 2018;Wick and Holt, 2019). Interestingly, the best assemblies were obtained when solely a subset of the long reads was used for an assembly with Canu, i.e., 100x coverage with the longest reads.…”
Section: Assembly Strategysupporting
confidence: 71%
See 1 more Smart Citation
“…We showed that Canu yields high-quality genomes from the perspective of TE biology: many piRNA clusters are contiguously assembled, and the TE landscape is accurately captured. The high quality of assemblies generated by Canu was also noted in several previous works (Jayakumar and Sakakibara, 2017;de Lannoy et al, 2017;Solares et al, 2018;Wick and Holt, 2019). Interestingly, the best assemblies were obtained when solely a subset of the long reads was used for an assembly with Canu, i.e., 100x coverage with the longest reads.…”
Section: Assembly Strategysupporting
confidence: 71%
“…Long reads usually have high error rates, and assemblies based on these reads may thus also contain appreciable amounts of errors (Sović et al, 2016;Vaser et al, 2017). Following recommendations of previous works (Solares et al, 2018;Chakraborty et al, 2019;Ellison and Cao, 2020), we aimed to reduce the error rate by polishing the assembly with Racon (long reads) (Vaser et al, 2017) and Pilon (short reads) (Walker et al, 2014). Polishing algorithms align reads to an assembly and infer the consensus sequence Assemblies were either based on a random subset of the reads (random) or on the longest reads (longest).…”
Section: Optimizing the Assembly Strategymentioning
confidence: 98%
“…DNA isolation and the preparation of SMRTbell libraries followed (Chin et al, 2016). The preparation of paired-end Illumina libraries followed RemoveChimera 1) were similar to those used for previous hybrid assemblies , Solares et al, 2018, except that the k-mer size was increased to 31. The k-mer size was increased to minimize the number of misassemblies by including 90% of all k-mers reported by the meryl program within the Canu package (Koren et al, 2017).…”
Section: Methodsmentioning
confidence: 99%
“…To integrate information obtained from the different assembly methods -Canu, DBG2OLC and FALCON-Unzip -we opted for an iterative approach of assembly merging using quickmerge (Chakraborty et al, 2016), following a broader application of assembly merging based on (Solares et al, 2018). Quickmerge merges assemblies to increase the contiguity of the most complete (query) genome by taking advantage of the contiguity of the second reference sequence.…”
Section: Methodsmentioning
confidence: 99%
“…To demonstrate the feasibility of our assembly strategy we applied LazyBastard to publicly available datasets [5,12,22] for three well studied model organisms, baker's yeast (S. cerevisiae, genome size . The data were downsampled to approximately 5× and 10× nanopore coverage for long reads, respectively, and Illumina coverage sufficient for short-read anchors.…”
Section: Resultsmentioning
confidence: 99%