HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model

Prabhakaran, Sandhya; Rey, Melanie; Zagordi, Osvaldo; Beerenwinkel, Niko; Röth, Volker

doi:10.1109/tcbb.2013.145

Cited by 100 publications

(158 citation statements)

References 15 publications

Supporting

Mentioning

158

Contrasting

Order By: Relevance

“…None of the other tools generated even only one such perfectly matching segment, possibly because they require much higher coverage. This improvement in terms of accuracy may be due to the probabilistic model that treats error Global haplotype assembly comparison of HaploClique with the software packages ShoRAH [33], PredictHaplo [14], and QuRe [16]. We report the estimated variant frequencies and, in parenthesis, the maximal length of the reconstructed haplotypes relative to the genome length, for each of the five variants.…”

Section: Discussionmentioning

confidence: 99%

“…Fourth, we evaluate the quality of the local and global haplotypes that HaploClique predicts. Lastly, we compare HaploClique to stateof-the-art tools ShoRAH [33], PredictHaplo [14], and QuRe [16] in quasispecies reconstruction of a simulated five virus mix of wellknown HIV-1 lab-strains.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…We performed haplotype reconstruction for the lab-mix using the tools ShoRAH [33], PredictHaplo [14], and QuRe [16], and compared the results to those of HaploClique (Table 1). Hapler v1.60 [19] did not accept NGS alignments with insertions, and hence was not applicable to this heterogeneous virus populations.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…For quasispecies assembly, approaches from different domains have been developed: (i) probabilistic mixture models [14], (ii) hidden Markov models [15], (iii) sampling schemes [16], (iv) combinatorial approaches based on analyzing the read overlap graph [8,[17][18][19], (v) coloring of overlap and conflict graphs by constraint programming [20], and (vi) exploiting the ''identical by descent'' information [21] in the HapCompass framework [22], originally designed for diploid single nucleotide polymorphism data.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Viral Quasispecies Assembly via Maximal Clique Enumeration

Töpfer

Marschall

Bull

et al. 2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Virus populations can display high genetic diversity within individual hosts. The intra-host collection of viral haplotypes, called viral quasispecies, is an important determinant of virulence, pathogenesis, and treatment outcome. We present HaploClique, a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples. We develop a statistical model for paired-end reads accounting for mutations, insertions, and deletions. Using an iterative maximal clique enumeration approach, read pairs are assembled into haplotypes of increasing length, eventually enabling global haplotype assembly. The performance of our quasispecies assembly method is assessed on simulated data for varying population characteristics and sequencing technology parameters. Owing to its paired-end handling, HaploClique compares favorably to state-of-the-art haplotype inference methods. It can reconstruct error-free full-length haplotypes from low coverage samples and detect large insertions and deletions at low frequencies. We applied HaploClique to sequencing data derived from a clinical hepatitis C virus population of an infected patient and discovered a novel deletion of length 3576167 bp that was validated by two independent long-read sequencing experiments. HaploClique is available at https://github.com/armintoepfer/haploclique. A summary of this

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Simulation Studiesmentioning

confidence: 99%

Section: Simulation Studiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Viral Quasispecies Assembly via Maximal Clique Enumeration

Töpfer

Marschall

Bull

et al. 2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Optimizing its set of parameters is non-trivial. PredictHaplo [38] is also capable of performing full recoveries of the HIV-1 nonenv genes, but has been shown to recover conservatively outside of HIV-1. [39,40].…”

Section: Methodsological Comparison Of Our Approachmentioning

confidence: 99%

Probabilistic recovery of cryptic haplotypes from metagenomic data

Nicholls

Aubrey²,

Grave

et al. 2017

Preprint

View full text Add to dashboard Cite

The cryptic diversity of microbial communities represent an untapped biotechnological resource for biomining, biorefining and synthetic biology. Revealing this information requires the recovery of the exact sequence of DNA bases (or "haplotype") that constitutes the genes and genomes of every individual present. This is a computationally difficult problem complicated by the requirement for environmental sequencing approaches (metagenomics) due to the resistance of the constituent organisms to culturing in vitro.Haplotypes are identified by their unique combination of DNA variants. However, standard approaches for working with metagenomic data require simplifications that violate assumptions in the process of identifying such variation. Furthermore, current haplotyping methods lack objective mechanisms for choosing between alternative haplotype reconstructions from microbial communities. To address this, we have developed a novel probabilistic approach for reconstructing haplotypes from complex microbial communities and propose the "metahaplome" as a definition for the set of haplotypes for any particular genomic region of interest within a metagenomic dataset. Implemented in the twin software tools Hansel and Gretel, the algorithm performs incremental probabilistic haplotype recovery using Naive Bayes -an efficient and effective technique. Our approach is capable of reconstructing the haplotypes with the highest likelihoods from metagenomic datasets without a priori knowledge or making assumptions of the distribution or number of variants. Additionally, the algorithm is robust to sequencing and alignment error without altering or discarding observed variation and uses all available evidence from aligned reads. We validate our approach using synthetic metahaplomes constructed from sets of real genes, and demonstrate its capability using metagenomic data from a complex HIV-1 strain mix. The results show that the likelihood framework can allow recovery from microbial communities of cryptic functional isoforms of genes with 100% accuracy. 1. CC-BY 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/117838 doi: bioRxiv preprint first posted online Mar. 17, 2017; Genomic research is progressing beyond the use of consensus DNA sequences to represent species, towards the ultimate goal of complete characterisation of the genetic diversity that exists across their populations.So far, research has focused on characterising specific aspects of this diversity, for example: identifying the entire gene-set of all strains of a species (the pangenome) [1]; identifying the groups of genes (or genetic variants within) that are inherited together in organisms across entire populations (the haplome) [2] or in viruses, identifying strains related by mutations in a highly mutagenic environment (the quasispecies) [3].However many communities (and especially microbial communities) maintain a fine balance b...

show abstract

Probabilistic Viral Quasispecies Assembly

Töpfer

Beerenwinkel

2016

Computational Methods for Next Generation Sequencing Data Analysis

View full text Add to dashboard Cite

HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model

Cited by 100 publications

References 15 publications

Viral Quasispecies Assembly via Maximal Clique Enumeration

Viral Quasispecies Assembly via Maximal Clique Enumeration

Probabilistic recovery of cryptic haplotypes from metagenomic data

Probabilistic Viral Quasispecies Assembly

Contact Info

Product

Resources

About