Shay Golan scite author profile

Shay Golan

5Publications

72Citation Statements Received

208Citation Statements Given

How they've been cited

How they cite others

161

208

Affiliations

Bar-Ilan University

Publications

Order By: Most citations

Locally Consistent Parsing for Text Indexing in Small Space

Birenzwige¹,

Golan²,

Porat³

2020

View full text Add to dashboard Cite

We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction, where a text S is given in a read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only O(|B|) words of space. The second problem is the Longest Common Extension (LCE) problem, where again a text S of length n is given in a read-only memory with some parameter 1 ≤ τ ≤ n, and the goal is to construct a data structure that uses O( n τ ) words of space and can compute for any pair of suffixes their longest common prefix length. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [35], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems.For the randomized algorithms, we introduce the first Las-Vegas SST construction algorithm that takes O(n) time. This is an improvement over the last result of Gawrychowski and Kociumaka [19] who obtained O(n) time for Monte-Carlo algorithm, and O(n log |B|) time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for a data structure that uses O( n τ ) words of space, can be constructed in linear time and answers LCE queries in O(τ ) time.For the deterministic algorithms, we introduce an SST construction algorithm that takes O(n(log n |B| + log * n)) time (for |B| = Ω(log n log * n)). This is the first almost linear time, O(n · polylog n), deterministic SST construction algorithm, where all previous algorithms take at least Ω min{n|B|, n 2 |B| } time. For the LCE problem, we introduce a data structure that uses O( n τ ) words of space and answers LCE queries in O(τ log * n) time, with O(n(log τ + log * n)) construction time (for τ = O( n log n log * n )). This data structure improves both query time and construction time upon the results of Tanimura et al. [37].

show abstract

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Golan

Kopelowitz

Porat

2018

View full text Add to dashboard Cite

Approximating text-to-pattern Hamming distances

Chan

Golan

Kociumaka

et al. 2020

View full text Add to dashboard Cite

We study the classic Text-to-Pattern Hamming Distances problem: given a pattern P of length m and a text T of length n, both over a polynomial-size alphabet, compute the Hamming distance between P and T [i . . i + m − 1] for every shift i, under the standard Word-RAM model with Θ(log n)-bit words. • We provide an O(n √ m) time Las Vegas randomized algorithm for this problem, beating the decades-old O(n √ m log m) running time [Abrahamson, SICOMP 1987]. We also obtain a deterministic algorithm, with a slightly higher O(n √ m(log m log log m) 1/4 ) running time. Our randomized algorithm extends to the k-bounded setting, with running time O n + nk √ m , removing all the extra logarithmic factors from earlier algorithms [Gawrychowski and Uznański, ICALP 2018; Chan, Golan, Kociumaka, Kopelowitz and Porat, STOC 2020].

show abstract

Streaming Pattern Matching with d Wildcards

2018

View full text Add to dashboard Cite

In the pattern matching with d wildcards problem one is given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol ? . A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P . In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space.In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0 ≤ δ ≤ 1. This algorithm usesÕ (d 1−δ ) amortized time per character andÕ(d 1+δ ) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d + log m) worst-case time per character and O(d log m) words of space.. * Part of this work took place while the second author was at University of Michigan. 1 We assume the RAM model where each word has size of O(log n) bits.

show abstract

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Golan

Porat

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shay Golan

Locally Consistent Parsing for Text Indexing in Small Space

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Approximating text-to-pattern Hamming distances

Streaming Pattern Matching with d Wildcards

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Contact Info

Product

Resources

About