2017
DOI: 10.1007/s10791-017-9297-7
|View full text |Cite
|
Sign up to set email alerts
|

Document retrieval on repetitive string collections

Abstract: Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
43
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(43 citation statements)
references
References 47 publications
0
43
0
Order By: Relevance
“…This is confirmed by the experiments. We showcase our new solution in one specific self-indexbased document retrieval framework, but we point out that this component can also be utilized in other variants as presented by Gagie et al [3].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is confirmed by the experiments. We showcase our new solution in one specific self-indexbased document retrieval framework, but we point out that this component can also be utilized in other variants as presented by Gagie et al [3].…”
Section: Resultsmentioning
confidence: 99%
“…E.g. for d 0 we mark the root as it is the LCA of the suffix pair (5,0), and node v 9 for suffix pair (0,2), again the root for (2,4), and v 13 for (4,1) and (1,3). Connecting all nodes marked with with a specific d i (see the green arrows in Fig.…”
Section: The Basic Framework and Data Structuresmentioning
confidence: 99%
“…To date, there exist several pattern matching indexes for repetitive text collections (see a couple of studies [21,10] and references therein). However, there are not many document retrieval indexes for repetitive text collections [5,8,23]. Most of these indexes [26,8] rely on a pattern-matching index needs Ω(n) bits in order to offer O(lg n) time per retrieved document.In this paper we introduce new simple and efficient document listing indexes aimed at highly repetitive text collections.…”
mentioning
confidence: 99%
“…However, there are not many document retrieval indexes for repetitive text collections [5,8,23]. Most of these indexes [26,8] rely on a pattern-matching index needs Ω(n) bits in order to offer O(lg n) time per retrieved document.In this paper we introduce new simple and efficient document listing indexes aimed at highly repetitive text collections. Like various preceding indexes, we achieve O(m+ndoc ·lg n) search time, yet our indexes are way faster and/or smaller than previous ones on various repetitive datasets, because they escape from the space/time tradeoff of the pattern-matching index.…”
mentioning
confidence: 99%
“…Their work, using large query logs, provides new insights into the relative efficiency of selective search compared to exhaustive random sharding, how to distribute those shards across machines, and yields details of trade-offs possible between throughput and latency constraints. Gagie et al (2017) examine indexing for repetitive collections. Their work includes effective compression techniques, methods for top-k retrieval and identifying the number of documents containing a given string.…”
mentioning
confidence: 99%