Héctor Ferrada scite author profile

Journal of Discrete Algorithms

2017

Locally Compressed Suffix Arrays

ACM J. Exp. Algorithmics

2015

Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes (which are very large) for counting the number of occurrences of a pattern in the text. Yet, they are still hundreds to thousands of times slower when it comes to locating those occurrences in the text. In this paper we introduce a new, local, compression scheme for suffix arrays which permits locating the occurrences extremely fast, while still being much smaller than classical indexes. The core of our contribution is the identification of the regularities exploited by the compression based on function Ψ, used for long time in compressed text indexing, with those exploited by Re-Pair on the differential suffix array. The latter enjoys the locality properties that the former methods lack. As another consequence of this locality, we show that our index can be implemented in secondary memory, where its access time improve thanks to compression, instead of worsening as is the norm in other self-indexes. Finally, some byproducts of our work, such as a compressed dictionary representation for Re-Pair, can be of independent interest.

show abstract

A Lempel-Ziv Compressed Structure for Document Listing

2013

Abstract. Document listing is the problem of preprocessing a set of sequences, called documents, so that later, given a short string called the pattern, we retrieve the documents where the pattern appears. While optimal-time and linear-space solutions exist, the current emphasis is in reducing the space requirements. Current document listing solutions build on compressed suffix arrays. This paper is the first attempt to solve the problem using a Lempel-Ziv compressed index of the text collections. We show that the resulting solution is very fast to output most of the resulting documents, taking more time for the final ones. This makes this index particularly useful for interactive scenarios or when listing some documents is sufficient. Yet, it also offers a competitive space/time tradeoff when returning the full answers.

show abstract

A filtering technique for fast Convex Hull construction in R2

Journal of Computational and Applied Mathematics

Hitschfeld

2020