2020
DOI: 10.1038/s41587-020-0503-6
|View full text |Cite
|
Sign up to set email alerts
|

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

Abstract: De novo assembly of a human genome using nanopore long-read sequences has been reported but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly we present Shasta, a de novo long read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled eleven highly contiguous human genomes de novo in nine days. We achieved ~63x cove… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
492
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 452 publications
(499 citation statements)
references
References 66 publications
6
492
1
Order By: Relevance
“…2a). ONT data were not considered for further benchmarking due to practical issues concerning systematic base call errors, consistency, and scalability at the time (early 2017) 39 ; however the technology has since improved in these areas 40 and will be reconsidered in future phases of the VGP, as will PacBio's recently released HiFi circular consensus sequencing (CCS) 41 . assembly pipeline applied across multiple species.…”
Section: Iterative Assembly Pipelinementioning
confidence: 99%
“…2a). ONT data were not considered for further benchmarking due to practical issues concerning systematic base call errors, consistency, and scalability at the time (early 2017) 39 ; however the technology has since improved in these areas 40 and will be reconsidered in future phases of the VGP, as will PacBio's recently released HiFi circular consensus sequencing (CCS) 41 . assembly pipeline applied across multiple species.…”
Section: Iterative Assembly Pipelinementioning
confidence: 99%
“…Collaborators evaluated 5 False Positive SNVs, 5 False Positive Indels, 5 False Negative SNVs, 5 False Negative Indels both inside and outside v3.3.2 along with 5 False Positive SNVs, 5 False Positive Indels, 5 False Negative SNVs, 5 False Negative Indels in the MHC for GRCh37. We generated IGV sessions with BAM files for Illumina HiSeq, 10x Genomics, PacBio HiFi 15kb & 20 kb merged, and ONT Ultralong 11 , then asked that the evaluators identify for each site if both alleles in the benchmark were correct and if both alleles in the query call set were correct.…”
Section: Evaluation Of the Benchmarkmentioning
confidence: 99%
“…5 These benchmarks and benchmarking tools helped enable the development and optimization of new technologies and bioinformatics approaches, including linked reads, 6 highly accurate long reads, 7 deep learning-based variant callers, 8,9 graph-based variant callers, 10 and de novo assembly. 11,12 However, these benchmarks did not cover some challenging regions that these new methods could access, including many known medically relevant genes. 13,14 This limitation highlighted the need for improved benchmarks covering segmental duplications, the Major Histocompatibility Complex (MHC), and other challenging regions.…”
Section: Introductionmentioning
confidence: 99%
“…Canu version 2.0, Flye version 2.7, Miniasm/Minipolish version 0.1.3 (35) Raven version 1.1.10 (36), NECAT version 0.01 (37), wtdbg2 version 2.5 (38), and shasta version 0.5.1 (39). All assemblers were run with default parameters ( agging raw or corrected reads depending on read input, Raven was run with theweaken ag when corrected reads were used).…”
Section: Validation Of Assembly and Comparison Of Long Read Assemblermentioning
confidence: 99%
“…Different isolates (variants) of the same species have been found to vary greatly in their phenotypes (16), but due to the relatively small number of isolates sequenced, the extent of genomic variation between strains is poorly understood. Owing to their genomes having multiple chromosomes that contribute to their relatively large genome sizes (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45) in comparison to bacterial microbes (around 5 Mb), de novo genome assemblies of Metarhizium spp. using rst generation sequencing is very costly, and second-generation sequencing results in assemblies that are highly contiguous, falling apart around repeat rich and homologous regions of the genome.…”
Section: Introductionmentioning
confidence: 99%