2014
DOI: 10.1101/008003
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

Abstract: We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
348
1

Year Published

2014
2014
2021
2021

Publication Types

Select...
7
1
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 189 publications
(350 citation statements)
references
References 88 publications
1
348
1
Order By: Relevance
“…De novo assembly of the genome sequencing data. De novo assembly of the long reads from SMRT Sequencing was performed using two assemblers: the Celera Assembler PBcR -MHAP pipeline 33 and Falcon 34 with different parameter settings. Quiver from SMRT Analysis v2.3.0 was used to polish base calling of contigs.…”
Section: Methodsmentioning
confidence: 99%
“…De novo assembly of the genome sequencing data. De novo assembly of the long reads from SMRT Sequencing was performed using two assemblers: the Celera Assembler PBcR -MHAP pipeline 33 and Falcon 34 with different parameter settings. Quiver from SMRT Analysis v2.3.0 was used to polish base calling of contigs.…”
Section: Methodsmentioning
confidence: 99%
“…This suggests that contig break points occur at the start of repeats or that most assembly breaks are caused by other factors, such as within-genome heterozygosity or haplotype-specific structural variation. To test this, we also tried 'diploid-aware' assemblers Falcon (https://github.com/PacificBiosciences/ falcon) and MinHash Alignment Process (MHAP) 14 . These assemblies had similar metrics but were less contiguous overall (Extended Data Fig.…”
mentioning
confidence: 99%
“…We assembled the PacBio reads from A. filiculoides and S. cucullata genomes using PBcR 61 , and the resulting drafts were then polished by Quiver 62 (A. filiculoides) or Pilon 63 (S. cucullata). Plastid genomes were separately assembled using Mitobim 64 and annotated in Geneious 65 with manual adjustments.…”
Section: Methodsmentioning
confidence: 99%