2021
DOI: 10.1101/2021.01.25.428049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Haploflow: Strain-resolvedde novoassembly of viral genomes

Abstract: In viral infections often multiple related viral strains are present, due to coinfection or within-host evolution. We describe Haploflow, a de Bruijn graph-based assembler for de novo genome assembly of viral strains from mixed sequence samples using a novel flow algorithm. We assessed Haploflow across multiple benchmark data sets of increasing complexity, showing that Haploflow is faster and more accurate than viral haplotype assemblers and generic metagenome assemblers not aiming to reconstruct strains. Hapl… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 81 publications
0
5
0
Order By: Relevance
“…Assemblies were evaluated with MetaQUAST v.5.1.0rc 37 , which was adapted for the evaluation of strain-resolved assembly (Supplementary text). To test the ability of the assemblers to generate near-complete strain-resolved genomes, we determined strain recall and precision, similar to 38 (Supplementary Table 3). Strain recall measures how many genomes are recovered with high genome fraction and few mismatches (mm).…”
Section: Overall Trendsmentioning
confidence: 99%
See 1 more Smart Citation
“…Assemblies were evaluated with MetaQUAST v.5.1.0rc 37 , which was adapted for the evaluation of strain-resolved assembly (Supplementary text). To test the ability of the assemblers to generate near-complete strain-resolved genomes, we determined strain recall and precision, similar to 38 (Supplementary Table 3). Strain recall measures how many genomes are recovered with high genome fraction and few mismatches (mm).…”
Section: Overall Trendsmentioning
confidence: 99%
“…Finally, the number of misassemblies describes the number of contigs which either contain a gap of more than 1kb, contain inserts of more than 1kb or align to different genomes. In addition to these metrics, we determined the strain recall and strain precision, similar to 38 , to quantify the presence of high-quality, strain-resolved assemblies. Strain recall is defined as the fraction of highquality (more than 90% genome fraction and less than 100 mismatches per 100 kb) genome assemblies recovered for all ground truth genomes.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…The code of Haploflow is available with a GPLv3 license on Github under https://github.com/hzi-bifo/Haploflow [78]. The version (v0.2) used for the assemblies in this publication is available under the DOI https://doi.org/10.5281/zenodo.…”
Section: Reconstruction Of Full Length Sars-cov-2 Sequencesmentioning
confidence: 99%
“…As a common limitation among all nanopore sequencing kits, high sequencing error rates limit the accurate characterization of quasi-species, new strains or subtypes of a virus and de novo assembly of viral reads [46,59,60]. In a given infected plant, closely related viral strains can be present with high average nucleotide identity (NI), and the assembly of individual strains present in low abundance or with low variation is complicated and challenging [61,62]. Even so, the capability of direct RNA sequencing kit for identifying viral strains with 20% to 40% divergence in term of NI has been demonstrated [63].…”
Section: Detection Of Grapevine Rna Viruses By Nanopore Direct Cdna and Rna Sequencingmentioning
confidence: 99%
“…Even so, the capability of direct RNA sequencing kit for identifying viral strains with 20% to 40% divergence in term of NI has been demonstrated [63]. In addition, Haploflow, a new strain-resolving assembler, has been described, which considers the differential coverage between strains to deconvolute the assembly graph into strain resolved genome assemblies and has been used to reconstruct viral strain genomes from human cytomegalovirus positive samples and SARS-CoV-2 wastewater samples [61]. Also, using the full reference sequence of the virus and a BLAST search with phylogeny is recommended for viral phylogeny and viral variation studies [60].…”
Section: Detection Of Grapevine Rna Viruses By Nanopore Direct Cdna and Rna Sequencingmentioning
confidence: 99%