2018
DOI: 10.1101/305268
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CHOP: Haplotype-aware path indexing in population graphs

Abstract: The practical use of graph-based reference genomes depends on the ability to align reads to them.Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. We propose CHOP a method that uses haplotype information to prevent this from happening. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based represen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 32 publications
2
6
0
Order By: Relevance
“…Also like FORGe, we showed that aligning to a super-population-matched major-allele reference did not substantially improve alignment accuracy compared to a global major-allele reference combining all super populations. Our results also reinforce that a linear aligner can be extended to incorporate variants and exhibit similar accuracy to a graph aligner [16,31].…”
Section: Discussionsupporting
confidence: 78%
See 1 more Smart Citation
“…Also like FORGe, we showed that aligning to a super-population-matched major-allele reference did not substantially improve alignment accuracy compared to a global major-allele reference combining all super populations. Our results also reinforce that a linear aligner can be extended to incorporate variants and exhibit similar accuracy to a graph aligner [16,31].…”
Section: Discussionsupporting
confidence: 78%
“…This might be accomplished using unsupervised, sequence-driven clustering methods [34,35], using the "founder sequence" framework [36,37], or using some form of submodular optimization [38]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [31] and efficient indexing for repetitive texts [39].…”
Section: Discussionmentioning
confidence: 99%
“…Also like FORGe, we showed that aligning to a super-population-matched major-allele reference did not substantially improve alignment accuracy compared to a global major-allele reference combining all super populations. Our results also reinforce that a linear aligner can be extended to incorporate variants and exhibit similar accuracy to a graph aligner 16,33 .…”
Section: Discussionsupporting
confidence: 78%
“…This might be accomplished using unsupervised, sequence-driven clustering methods 36,37 , using the "founder sequence" framework 38,39 , or using some form of submodular optimization 40 . A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing 33 and efficient indexing for repetitive texts 41 .…”
Section: Discussionmentioning
confidence: 99%
“…Indexing variation graphs is challenging because the number of possible paths can be exponential in the number of variants encoded. Typical approaches to handle this problem are to index only some of the variation by limiting the indexed paths either heuristically [16,27,28] or by using panels of known haplotypes [29,30]. A recent method avoids the exponential blowup by dynamically indexing the graph and the reads, thereby exploiting that there can be exponentially many paths in the graphs, but not in the set of reads to be queried [31].…”
Section: Introductionmentioning
confidence: 99%