Inge Li Gørtz scite author profile

We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation. We show that the new compression scheme achieves close to optimal worst-case compression, can compress exponentially better than DAG compression, is never much worse than DAG compression, and supports navigational queries in logarithmic time.

show abstract

Longest Common Extensions in Sublinear Space

Bille

Gørtz

Knudsen

et al. 2015

View full text Add to dashboard Cite

The longest common extension problem (LCE problem) is to construct a data structure for an input string T of length n that supports LCE(i, j) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions i and j in T . This classic problem has a well-known solution that uses O(n) space and O(1) query time. In this paper we show that for any trade-off parameter 1 ≤ τ ≤ n, the problem can be solved in O( n τ ) space and O(τ ) query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.

show abstract

Tree compression with top trees

Bille

Gørtz

Landau

et al. 2015

Information and Computation

View full text Add to dashboard Cite

We introduce a new compression scheme for labeled trees based on top trees [3]. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation. We show that the new compression scheme achieves close to optimal worst-case compression, can compress exponentially better than DAG compression, is never much worse than DAG compression, and supports navigational queries in logarithmic time.

show abstract

Sparse Text Indexing in Small Space

Bille

Fischer

Gørtz

et al. 2016

ACM Trans. Algorithms

View full text Add to dashboard Cite

In this work we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays and sparse positions heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.Attempts at breaking the naive bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the b suffixes were evenly spaced in T . In this paper there is no constraint on the locations of the suffixes.Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log 2 b) time. To achieve this we develop a technique, that allows to efficiently answer b longest common prefix queries on suffixes of T , using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte-Carlo and outputs the correct tree with high probability. We then give a Las-Vegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O( √ n).Furthermore, additional tradeoffs between the space usage and the construction time for the Monte-Carlo algorithm are given. Finally, we show that at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n + b log b) time and O(b) space.

show abstract

Time–space trade-offs for Lempel–Ziv compressed indexing

Bille¹,

Ettienne²,

Gørtz³

et al. 2018

Theoretical Computer Science

View full text Add to dashboard Cite

Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. We obtain the following time-space trade-offs: For constant-sized alphabetswhere n and m are the length of the input string and query string respectively, z is the number of phrases in the LZ77 parse of the input string, occ is the number of occurrences of the query in the input and ǫ > 0 is an arbitrarily small constant. In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lg lg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1 + lg ǫ z lg(n/z) )). However, for any polynomial compression ratio, i.e., z = O(n 1−δ ), for constant δ > 0, this becomes O(m). Our index also supports extraction of any substring of length ℓ in O(ℓ + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Inge Li Gørtz

Tree Compression with Top Trees

Longest Common Extensions in Sublinear Space

Tree compression with top trees

Sparse Text Indexing in Small Space

Time–space trade-offs for Lempel–Ziv compressed indexing

Contact Info

Product

Resources

About