2017
DOI: 10.1101/130930
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids

Abstract: A framework that formulates haplotype assembly as sparse tensor decomposition is proposed. The problem is cast as that of decomposing a tensor having special structural constraints and missing a large fraction of its entries into a product of two factors, U and V; tensor V reveals haplotype information while U is a sparse matrix encoding the origin of erroneous sequencing reads. An algorithm, AltHap, which reconstructs haplotypes of either diploid or polyploid organisms by solving this decomposition problem is… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(14 citation statements)
references
References 42 publications
0
14
0
Order By: Relevance
“…where M is the one-to-one mapping from the set of reconstructed haplotype to the set of true haplotype (Hashemi 2018), i.e., mapping that determines the best possible match between the two sets of haplotypes. To characterize performance of methods for reconstruction of viral quasispecies with generally a priori unknown number of components, in addition to correct phasing rate we also quantify recall rate, defined as the fraction of perfectly reconstructed components in a population (i.e., recall rate = T P T P +F N ), and predicted proportion, defined as the ratio of the estimated and the true number of components in a genomic mixture (Ahn 2018).…”
Section: Problem Formulationmentioning
confidence: 99%
See 4 more Smart Citations
“…where M is the one-to-one mapping from the set of reconstructed haplotype to the set of true haplotype (Hashemi 2018), i.e., mapping that determines the best possible match between the two sets of haplotypes. To characterize performance of methods for reconstruction of viral quasispecies with generally a priori unknown number of components, in addition to correct phasing rate we also quantify recall rate, defined as the fraction of perfectly reconstructed components in a population (i.e., recall rate = T P T P +F N ), and predicted proportion, defined as the ratio of the estimated and the true number of components in a genomic mixture (Ahn 2018).…”
Section: Problem Formulationmentioning
confidence: 99%
“…The semi-experimental data is obtained by simulating mutations, shotgun sequencing procedure, read alignment and SNP calling steps in a fictitious experiment on a single individual Solanum Tuberosum (polyploid with k = 4). Details on how exactly the semi-experimental data is generated and processed can be found in Supplementary Document C. We compare the performance of GAEseq on this data with publicly available software HapCompass (Aguiar 2012), an algorithm that relies on graph-theoretic models to perform haplotype assembly, H-PoP (Xie 2016), a dynamic programming method, and AltHap (Hashemi 2018), a method based on tensor factorization. The performance of different methods is evaluated in terms of the MEC score and CPR.…”
Section: Performance Comparison On Biallelic Solanum Tuberosum Semi-ementioning
confidence: 99%
See 3 more Smart Citations