Haploflow: Strain-resolved<i>de novo</i>assembly of viral genomes

Fritz, Adrian; Bremges, Andreas; Deng, Zhi-Luo; Lesker, Till Robin; Götting, Jasper; Ganzenmüller, Tina; Sczyrba, Alexander; Dilthey, Alexander; Klawonn, Frank; McHardy, Alice C.

doi:10.1101/2021.01.25.428049

Cited by 4 publications

(5 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Assemblies were evaluated with MetaQUAST v.5.1.0rc 37 , which was adapted for the evaluation of strain-resolved assembly (Supplementary text). To test the ability of the assemblers to generate near-complete strain-resolved genomes, we determined strain recall and precision, similar to 38 (Supplementary Table 3). Strain recall measures how many genomes are recovered with high genome fraction and few mismatches (mm).…”

Section: Overall Trendsmentioning

confidence: 99%

“…Finally, the number of misassemblies describes the number of contigs which either contain a gap of more than 1kb, contain inserts of more than 1kb or align to different genomes. In addition to these metrics, we determined the strain recall and strain precision, similar to 38 , to quantify the presence of high-quality, strain-resolved assemblies. Strain recall is defined as the fraction of highquality (more than 90% genome fraction and less than 100 mismatches per 100 kb) genome assemblies recovered for all ground truth genomes.…”

Section: Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

Critical Assessment of Metagenome Interpretation - the second round of challenges

Meyer

Fritz

Deng

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the community-driven initiative for the Critical Assessment of Metagenome Interpretation (CAMI). In its second challenge, CAMI engaged the community to assess their methods on realistic and complex metagenomic datasets with long and short reads, created from ∼1,700 novel and known microbial genomes, as well as ∼600 novel plasmids and viruses. Altogether 5,002 results by 76 program versions were analyzed, representing a 22x increase in results.Substantial improvements were seen in metagenome assembly, some due to using long-read data. The presence of related strains still was challenging for assembly and genome binning, as was assembly quality for the latter. Taxon profilers demonstrated a marked maturation, with taxon profilers and binners excelling at higher bacterial taxonomic ranks, but underperforming for viruses and archaea. Assessment of clinical pathogen detection techniques revealed a need to improve reproducibility. Analysis of program runtimes and memory usage identified highly efficient programs, including some top performers with other metrics. The CAMI II results identify current challenges, but also guide researchers in selecting methods for specific analyses.

show abstract

Section: Overall Trendsmentioning

confidence: 99%

Section: Evaluation Metricsmentioning

confidence: 99%

Critical Assessment of Metagenome Interpretation - the second round of challenges

Meyer

Fritz

Deng

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The code of Haploflow is available with a GPLv3 license on Github under https://github.com/hzi-bifo/Haploflow [78]. The version (v0.2) used for the assemblies in this publication is available under the DOI https://doi.org/10.5281/zenodo.…”

Section: Reconstruction Of Full Length Sars-cov-2 Sequencesmentioning

confidence: 99%

Haploflow: strain-resolved de novo assembly of viral genomes

et al. 2021

View full text Add to dashboard Cite

With viral infections, multiple related viral strains are often present due to coinfection or within-host evolution. We describe Haploflow, a deBruijn graph-based assembler for de novo genome assembly of viral strains from mixed sequence samples using a novel flow algorithm. We assess Haploflow across multiple benchmark data sets of increasing complexity, showing that Haploflow is faster and more accurate than viral haplotype assemblers and generic metagenome assemblers not aiming to reconstruct strains. We show Haploflow reconstructs viral strain genomes from patient HCMV samples and SARS-CoV-2 wastewater samples identical to clinical isolates.

show abstract

“…As a common limitation among all nanopore sequencing kits, high sequencing error rates limit the accurate characterization of quasi-species, new strains or subtypes of a virus and de novo assembly of viral reads [46,59,60]. In a given infected plant, closely related viral strains can be present with high average nucleotide identity (NI), and the assembly of individual strains present in low abundance or with low variation is complicated and challenging [61,62]. Even so, the capability of direct RNA sequencing kit for identifying viral strains with 20% to 40% divergence in term of NI has been demonstrated [63].…”

Section: Detection Of Grapevine Rna Viruses By Nanopore Direct Cdna and Rna Sequencingmentioning

confidence: 99%

“…Even so, the capability of direct RNA sequencing kit for identifying viral strains with 20% to 40% divergence in term of NI has been demonstrated [63]. In addition, Haploflow, a new strain-resolving assembler, has been described, which considers the differential coverage between strains to deconvolute the assembly graph into strain resolved genome assemblies and has been used to reconstruct viral strain genomes from human cytomegalovirus positive samples and SARS-CoV-2 wastewater samples [61]. Also, using the full reference sequence of the virus and a BLAST search with phylogeny is recommended for viral phylogeny and viral variation studies [60].…”

Section: Detection Of Grapevine Rna Viruses By Nanopore Direct Cdna and Rna Sequencingmentioning

confidence: 99%

Grapevine Virology in the Third-Generation Sequencing Era: From Virus Detection to Viral Epitranscriptomics

Javaran¹,

Moffett²,

Lemoyne³

et al. 2021

Preprint

View full text Add to dashboard Cite

Among all economically important plant species in the world, grapevine (Vitis vinifera L.) is the most cultivated fruit plant. It has a significant impact on the economies of many countries through wine and fresh and dried fruit production. In recent years, the grape and wine industry has been facing outbreaks of known and emerging viral diseases across the world. Although high-throughput sequencing (HTS) has been used extensively in grapevine virology, the application and potential of third-generation sequencing have not been explored in understanding grapevine viruses and their impact on the grapevine. Nanopore sequencing, a third-generation technology, can be used for direct sequencing of both RNA and DNA with minimal infrastructure. Compared to other HTS methods, the MinION nanopore platform is faster and more cost-effective and allows for long-read sequencing. Due to the size of the MinION device, it can be easily carried for field viral disease surveillance. This review article discusses grapevine viruses and their diagnostic methods, the principle of nanopore sequencing technology and its application in grapevine virus detection, virus–plant interactions, as well as the characterization of viral RNA modifications.

show abstract

Haploflow: Strain-resolvedde novoassembly of viral genomes

Cited by 4 publications

References 81 publications

Critical Assessment of Metagenome Interpretation - the second round of challenges

Critical Assessment of Metagenome Interpretation - the second round of challenges

Haploflow: strain-resolved de novo assembly of viral genomes

Grapevine Virology in the Third-Generation Sequencing Era: From Virus Detection to Viral Epitranscriptomics

Contact Info

Product

Resources

About