We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings.Given a string of length n over an alphabet of size σ ≤ n, our method builds the wavelet tree in O(n log σ/ √ log n) time, improving upon the state-of-the-art algorithm by a factor of √ log n. As a consequence, given an array of n integers we can construct in O(n √ log n) time a data structure consisting of O(n) machine words and capable of answering rank/select queries for the subranges of the array in O(log n/ log log n) time. This is a log log n-factor improvement in query time compared to Chan and Pȃtraşcu (SODA 2010) and a √ log n-factor improvement in construction time compared to Brodal et al. (Theor. Comput. Sci. 2011).Next, we switch to stringological context and propose a novel notion of wavelet suffix trees. For a string w of length n, this data structure occupies O(n) words, takes O(n √ log n) time to construct, and simultaneously captures the combinatorial structure of substrings of w while enabling efficient top-down traversal and binary search. In particular, with a wavelet suffix tree we are able to answer in O(log |x|) time the following two natural analogues of rank/select queries for suffixes of substrings: 1) For substrings x and y of w (given by their endpoints) count the number of suffixes of x that are lexicographically smaller than y; 2) For a substring x of w (given by its endpoints) and an integer k, find the k-th lexicographically smallest suffix of x. We further show that wavelet suffix trees allow to compute a run-length-encoded Burrows-Wheeler transform of a substring x of w (again, given by its endpoints) in O(s log |x|) time, where s denotes the length of the resulting runlength encoding. This answers a question by Cormode and Muthukrishnan (SODA 2005), who considered an analogous problem for Lempel-Ziv compression.All our algorithms, except for the construction of wavelet suffix trees, which additionally requires O(n) time in expectation, are deterministic and operate in the word RAM model.
Abstract. We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.
Abstract. Given m documents of total length n, we consider the problem of finding a longest string common to at least d ≥ 2 of the documents. This problem is known as the longest common substring (LCS) problem and has a classic O(n) space and O(n) time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1 ≤ τ ≤ n, the LCS problem can be solved in O(τ ) space and O(n 2 /τ ) time, thus providing the first smooth deterministic timespace trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ -additive approximation to the LCS in O(n 2 /τ ) time and O(1) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in O(τ ) space must use Ω(n log(n/(τ log n))/ log log(n/(τ log n)) time.
We consider the problems of computing the maximal and the minimal non-empty suffixes of substrings of a longer text of length n. For the minimal suffix problem we show that for every τ , 1 ≤ τ ≤ log n, there exists a linear-space data structure with O(τ) query time and O(n log n/τ) preprocessing time. As a sample application, we show that this data structure can be used to compute the Lyndon decomposition of any substring of the text in O(kτ) time, where k is the number of distinct factors in the decomposition. For the maximal suffix problem, we give a linear-space structure with O(1) query time and O(n) preprocessing time. In other words, we simultaneously achieve both the optimal query time and the optimal construction time. $ This article is based on a study first reported at 24th and 25th Symposiums on Combinatorial Pattern Matching.
Abstract. The Longest Common Substring problem is to compute the longest substring which occurs in at least d ≥ 2 of m strings of total length n. In this paper we ask the question whether this problem allows a deterministic time-space trade-off using O(n 1+ε ) time and O(n 1−ε ) space for 0 ≤ ε ≤ 1. We give a positive answer in the case of two strings (d = m = 2) and 0 < ε ≤ 1/3. In the general case where 2 ≤ d ≤ m, we show that the problem can be solved in O(n 1−ε ) space and O(n 1+ε log 2 n(d log 2 n + d 2 )) time for any 0 ≤ ε < 1/3.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.