Roberto Konow scite author profile

Navarro

2013

An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA'12] takes O(m + k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n-3n bytes, with O(m+(k+log log n) log log n) time, on typical texts. The index is up to 25 times faster than the best previous compressed solutions, and requires at most 5% more space in practice (and in some cases as little as one half). Apart from replacing classical by compressed data structures, our main idea is to replace suffix tree sampling by frequency thresholding to achieve compression.

Faster and smaller inverted indices with treaps

Navarro

Clarke

et al. 2013

We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that the space consumption is below 10% of the size of the corpus and the index performs queries up to twice as fast than previous compact representations, which in addition require more space. Modern two-stage (massive filtering / detailed ranking) information retrieval systems would benefit from this boosting of the filtration stage of the query resolution process, which would free more resources for the ranking stage, thus enabling more precise results within a given time budget.

Aggregated 2D range queries on clustered points

Brisaboa

Bernardo²,

et al. 2016

Information Systems

Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP cubes. We introduce a technique to represent grids supporting aggregated range queries that requires little space when the data points in the grid are clustered, which is common in practice. We show how this general technique can be used to support two important types of aggregated queries, which are ranked range queries and counting range queries. Our experimental evaluation shows that this technique can speed up aggregated queries up to more than an order of magnitude, with a small space overhead.

Dual-Sorted Inverted Lists in Practice

Navarro

2012

Abstract. We implement a recent theoretical proposal to represent inverted lists in memory, in a way that docid-sorted and weight-sorted lists are simultaneously represented in a single wavelet tree data structure. We compare our implementation with classical representations, where the ordering favors either bag-of-word queries or Boolean and weighted conjunctive queries, and demonstrate that the new data structure is faster than the state of the art for conjunctive queries, while it offers an attractive space/time tradeoff when both kinds of queries are of interest.