2013
DOI: 10.1007/978-3-642-38905-4_12
|View full text |Cite
|
Sign up to set email alerts
|

Document Listing on Repetitive Collections

Abstract: Abstract. Many document collections consist largely of repeated material, and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing. In our experiments, our additional structures on top of the RLCSA can reduce the query time for document listing by an order of magnitude… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
25
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(25 citation statements)
references
References 20 publications
0
25
0
Order By: Relevance
“…Although they perform reasonably well in practice, none of the existing structures for document listing on repetitive collections [14,23] offer good worst-case time guarantees combined with worst-case space guarantees that are appropriate for repetitive collections, that is, growing with n+s rather than with N . In this paper we present the first document listing index offering good guarantees in space and time for repetitive collections: our index That is, at the price of being an O(lg D) space factor away from what could be hoped from a grammar-based index, our index offers document listing with useful time bounds per listed document.…”
Section: Our Contributionsmentioning
confidence: 99%
See 3 more Smart Citations
“…Although they perform reasonably well in practice, none of the existing structures for document listing on repetitive collections [14,23] offer good worst-case time guarantees combined with worst-case space guarantees that are appropriate for repetitive collections, that is, growing with n+s rather than with N . In this paper we present the first document listing index offering good guarantees in space and time for repetitive collections: our index That is, at the price of being an O(lg D) space factor away from what could be hoped from a grammar-based index, our index offers document listing with useful time bounds per listed document.…”
Section: Our Contributionsmentioning
confidence: 99%
“…We do not store the lists themselves in various orders, but just succinct range minimum query (RMQ) data structures [19] that allow implementing document listing on ranges of lists [51]. Even those RMQ structures are too large for our purposes, so they are further compressed exploiting the fact that their underlying data has long increasing runs, so the structures are reduced with techniques analogous to those developed for the ILCP data structure [23].…”
Section: Our Contributionsmentioning
confidence: 99%
See 2 more Smart Citations
“…(6) Indexing a highly repetitive or a highly similar document collection is an active line of research. In recent work, Gagie et al [2013] propose an efficient document retrieval index suitable for a repetitive collection. An open problem is to extend the result for handling top-k queries.…”
Section: Resultsmentioning
confidence: 99%