2015
DOI: 10.1038/nbt.3238
|View full text |Cite|
|
Sign up to set email alerts
|

Assembling large genomes with single-molecule sequencing and locality-sensitive hashing

Abstract: Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) f… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
734
1
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 884 publications
(740 citation statements)
references
References 83 publications
4
734
1
1
Order By: Relevance
“…Read clustering approaches are beneficial for assembling highly similar sequences with PacBio data (11,1416). To integrate this with our assembly pipeline, we used the PacBio SMRTanalysis HGAP3 algorithm (11) with empirically determined custom parameters (see github repository: https://github.com/paajanen/RenSeq).…”
Section: Resultsmentioning
confidence: 99%
“…Read clustering approaches are beneficial for assembling highly similar sequences with PacBio data (11,1416). To integrate this with our assembly pipeline, we used the PacBio SMRTanalysis HGAP3 algorithm (11) with empirically determined custom parameters (see github repository: https://github.com/paajanen/RenSeq).…”
Section: Resultsmentioning
confidence: 99%
“…The last step called the consensus contigs from output files from two previous steps and raw PacBio reads. In PacBio‐only assembly, first, the raw reads from the PacBio platform were corrected using the PBcR pipeline (Berlin et al ., 2015) with the self‐correction feature enabled and the minimum length of PacBio fragment to keep set to 500. The assembly was done using The Celera Assembler (CA) version 8.3rc2 (Myers et al ., 2000) leaving the parameters as default.…”
Section: Methodsmentioning
confidence: 99%
“…An essential component of SMRT sequencing is the zero-mode waveguide (ZMW) 4 , a zeptolitre-volume cylindrical cavity (~100 nm diameter and height) in which the DNA/polymerase complex is immobilised 4 . Major advantages of SMRT sequencing over second-generation sequencing methods include long average read lengths of more than 10,000 bases and lack of GC% bias 3, 5, 6 , critical for gap-free sequencing, and the ability to directly detect DNA base modifications by monitoring polymerase kinetics 2 . Apart from DNA sequencing, ZMWs have been exploited for single molecule RNA sequencing/epigenetics 7 and a variety of other single-molecule studies 8–13 .…”
mentioning
confidence: 99%