2014
DOI: 10.1093/bioinformatics/btu484
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic single-individual haplotyping

Abstract: Motivation: Accurate haplotyping—determining from which parent particular portions of the genome are inherited—is still mostly an unresolved problem in genomics. This problem has only recently started to become tractable, thanks to the development of new long read sequencing technologies. Here, we introduce ProbHap, a haplotyping algorithm targeted at such technologies. The main algorithmic idea of ProbHap is a new dynamic programming algorithm that exactly optimizes a likelihood function specified by a probab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
75
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(77 citation statements)
references
References 19 publications
2
75
0
Order By: Relevance
“…We compared HAPCOL with three state-of-the-art haplotyping tools specifically designed for handling long reads, namely, REFHAP, which was shown to be one of the most accurate heuristic methods (Duitama et al, 2012), PROBHAP, a recent probabilistic method which has been shown to be sensibly more accurate than REFHAP (Kuleshov, 2014) and WHATSHAP, the first exact approach for the weighted MEC problem specifically designed for long reads (Patterson et al, 2014(Patterson et al, , 2015. At higher coverages, applications such as SNP calling or validating which SNPs are really heterozygous in the given sample (e.g.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We compared HAPCOL with three state-of-the-art haplotyping tools specifically designed for handling long reads, namely, REFHAP, which was shown to be one of the most accurate heuristic methods (Duitama et al, 2012), PROBHAP, a recent probabilistic method which has been shown to be sensibly more accurate than REFHAP (Kuleshov, 2014) and WHATSHAP, the first exact approach for the weighted MEC problem specifically designed for long reads (Patterson et al, 2014(Patterson et al, , 2015. At higher coverages, applications such as SNP calling or validating which SNPs are really heterozygous in the given sample (e.g.…”
Section: Resultsmentioning
confidence: 99%
“…Two recent articles (Kuleshov, 2014;Patterson et al, 2014) aim at processing future-generation long reads by introducing algorithms exponential in the sequencing coverage, a parameter which is not expected to grow as fast as read length with the advent of future-generation technologies. The first algorithm, called PROBHAP (Kuleshov, 2014), is a probabilistic dynamic programming algorithm that optimizes a likelihood function generalizing the objective function of MEC. Albeit PROBHAP is significantly slower than the previous heuristics, it obtained a noticeable improvement in accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…• they assume that the solution is a pair of haplotypes from diploid parents, and discard/alter observations until a pair of haplotypes can be determined [17,34] • they discard SNP sites that feature three or more alleles as errors [34] • they can generate a unrealistically large number of unordered potential haplotypes [4,35] • they are too computationally expensive for highdepth short read data sets [36] • they require a good quality reference genome [37] • they are no longer maintained/are specific to certain data/cannot be installed [38] It is important to note that no other tool that claims to recover haplotypes or strains from a microbial population has attempted to validate their work biologically.…”
Section: Discussion Comparison To Related Workmentioning
confidence: 99%
“…With the advent of long-read technology, ProbHap recognised a niche in applying computationally expensive dynamic programming solutions to low coverage long-reads [41]. These solutions are inappropriate for high-depth short read data sets as the run time increases exponentially with coverage.…”
Section: Methodsological Comparison Of Our Approachmentioning
confidence: 99%
“…The more recent ViQuaS [40] reports higher recall than PredictHaplo, as expected for an algorithm based on an overlap assembler. However its precision is influenced by the quality of the available reference for the postassembly filtering step, making ViQuaS less suitable for the analysis of a metagenome, where a good reference is unavailable.With the advent of long-read technology, ProbHap recognised a niche in applying computationally expensive dynamic programming solutions to low coverage long-reads [41]. These solutions are inappropriate for high-depth short read data sets as the run time increases exponentially with coverage.…”
mentioning
confidence: 99%