2022
DOI: 10.1007/978-3-031-04749-7_15
|View full text |Cite
|
Sign up to set email alerts
|

Co-linear Chaining with Overlaps and Gap Costs

Abstract: Motivation: Co-linear chaining has proven to be a powerful technique for finding approximately optimal alignments and approximating edit distance. It is used as an intermediate step in numerous mapping tools that follow seed-and-extend strategy. Despite this popularity, subquadratic time algorithms for the case where chains support anchor overlaps and gap costs are not currently known. Moreover, a theoretical connection between co-linear chaining cost and edit distance remains unknown. Results: We present algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 32 publications
0
11
0
Order By: Relevance
“…We only use k-mer seeds in this study, although other types of seeds are possible (Keich et al, 2004;Kie lbasa et al, 2011). An optimal increasing subsequence of possibly overlapping anchors based on some score is then collected into a chain, where increasing is defined with the standard precedence relationship (Jain et al, 2022) between k-mer anchors (See Figure 5a and Chaining below). The chain is extended into a full alignment by aligning between anchor gaps in the chain.…”
Section: Assumptions and Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…We only use k-mer seeds in this study, although other types of seeds are possible (Keich et al, 2004;Kie lbasa et al, 2011). An optimal increasing subsequence of possibly overlapping anchors based on some score is then collected into a chain, where increasing is defined with the standard precedence relationship (Jain et al, 2022) between k-mer anchors (See Figure 5a and Chaining below). The chain is extended into a full alignment by aligning between anchor gaps in the chain.…”
Section: Assumptions and Modelsmentioning
confidence: 99%
“…Extension and chaining runtimes Given sorted anchors, let T Chain be the time spent finding an optimal chain. T Chain depends on the objective function (Mäkinen and Sahlin, 2020;Jain et al, 2022;Abouelhoda and Ohlebusch, 2005;Otto et al, 2011). Since our gap costs are linear, T Chain = O(N log N ) where N is the number of anchors (Abouelhoda and Ohlebusch, 2005).…”
Section: Assumptions and Modelsmentioning
confidence: 99%
“…Namely, we obtain an O(m + n + k 2 |V | + |E| + kN log N ) time algorithm for computing a longest common subsequence (LCS) between a query string Q and a path of G, where m = |Q|, n is the total length of node labels, k is the width (minimum number of paths covering the nodes) of G, and N is the number of maximal exact matches (MEMs) between Q and the node labels (node MEMs). For the case with two strings as input, a recent formulation of co-linear chaining [19] captures unit cost edit distance. There has been an attempt to extend the results to graphs considering gap costs [6], but it appears difficult to make such formulation fully symmetric (due to there being exponential many paths between two anchors).…”
Section: Introductionmentioning
confidence: 99%
“…Co-linear chaining is a mathematically rigorous approach to do clustering of anchors. It is well studied for the case of sequence-to-sequence alignment [1,11,12,16,30,34,43], and is widely used in present-day long read to reference sequence aligners [18,23,38,40].…”
Section: Introductionmentioning
confidence: 99%
“…However, the problem formulations in these works did not include gap cost. Without penalizing gaps, co-linear chaining is less effective [16]. A challenge in enforcing gap cost is that measuring gap between two loci in a DAG is not a constant-time operation like in a sequence.…”
Section: Introductionmentioning
confidence: 99%