Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing 2016
DOI: 10.1145/2897518.2897577
|View full text |Cite
|
Sign up to set email alerts
|

Streaming algorithms for embedding and computing edit distance in the low distance regime

Abstract: The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other.In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter k ≪ n. We present two streaming algorithms for computing edit distance: One runs in time O(n + k 2 ) and the other n + O(k 3 ). By writing n + O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
108
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 57 publications
(108 citation statements)
references
References 32 publications
0
108
0
Order By: Relevance
“…Independently developed fuzzy extractors [9] can also be seen as providing a document exchange scheme for some k polynomially small in n. A randomized scheme by Jowhari [10] independently achieved a size of O(k log n log * n). In two recent break-throughs Chakraborty, Goldenberg, and Kouckỳ [11] designed a low distortion embedding from edit distance to hamming distance which can be used to get a summary of size Θ(k 2 log n) and Bellazougi and Zhang [2] further build on this randomized embedding and achieved a scheme with summary size Θ(k log 2 k + k log n) which is order optimal for 4 k = exp( √ log n). All of these schemes are randomized.…”
Section: Introductionmentioning
confidence: 99%
“…Independently developed fuzzy extractors [9] can also be seen as providing a document exchange scheme for some k polynomially small in n. A randomized scheme by Jowhari [10] independently achieved a size of O(k log n log * n). In two recent break-throughs Chakraborty, Goldenberg, and Kouckỳ [11] designed a low distortion embedding from edit distance to hamming distance which can be used to get a summary of size Θ(k 2 log n) and Bellazougi and Zhang [2] further build on this randomized embedding and achieved a scheme with summary size Θ(k log 2 k + k log n) which is order optimal for 4 k = exp( √ log n). All of these schemes are randomized.…”
Section: Introductionmentioning
confidence: 99%
“…for the Ulam metric (edit distance with no repetition, which obviously requires a large alphabet) distinguish between t vs Θ(t) in O( n t + √ n) time, achieving a bound that is similar to the folklore sampling algorithm for approximating Hamming distance. There is a long line of work on edit distance and related problems, aiming to achieve fast running time [AN10, AIKH13, Sah17, BEG + 18, HSSS19], low distortion embedding [OR07, KR06,CGK16b,BZ16], small space complexity [CGK16b, BZ16, BJKK04] and parallel algorithms [HSS19]. The work of Andoni, Onak and Krauthgamer [AKO10] achieves a sublinear asymmetric query complexity for approximating edit distance; however it does not lead to any sublinear time algorithm since one of the strings must be read in its entirety.…”
Section: Resultsmentioning
confidence: 99%
“…Their algorithm computes a constant-size sketch but still requires a linear pass over the data. This result was later improved to hold for general strings [CGK16b] via embedding into Hamming distance, but again in linear time.…”
Section: What Is the Right Gap?mentioning
confidence: 98%
“…Then, random number u returned from F unc F is converted to a random number from the Cauchy distribution in F unc H as tan(π · (u − 0.5))/β at line 8. [2], [33] is another string embedding using a randomized algorithm. Let S i for i = 1,2,...,N be input strings of alphabet Σ and let L be the maximum length of input strings.…”
Section: Scalable Alignment Kernelsmentioning
confidence: 99%