2019
DOI: 10.1007/978-3-030-32686-9_34
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Abstract: Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing string collections are composed of very similar documents, such as versioned code and document collections, genome repositories, etc. Plain pattern-matching indexes designed for repetitive text collections achieve orders-of-magnitude reductions in space. Instead, there are not ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(15 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…Following the ideas for the document listing problem proposed in [9], we grammar compress DA producing a binary and balanced grammar of ν non-terminals, that can be stored in O(r log(n/r)) bits [12]. Let T be the parse tree of the document array DA [1..n], given a non terminal node nt ∈ T let DA[s nt ..e nt ] be its expansion.…”
Section: Precomputed Document List With Frequenciesmentioning
confidence: 99%
See 3 more Smart Citations
“…Following the ideas for the document listing problem proposed in [9], we grammar compress DA producing a binary and balanced grammar of ν non-terminals, that can be stored in O(r log(n/r)) bits [12]. Let T be the parse tree of the document array DA [1..n], given a non terminal node nt ∈ T let DA[s nt ..e nt ] be its expansion.…”
Section: Precomputed Document List With Frequenciesmentioning
confidence: 99%
“…• GCDA-PDL: Grammar-Compressed Document Array with Precomputed Document Lists. Solution described in Section 4.1, using balanced Re-Pair 2 for DA and sampling the sparse tree as in [9].…”
Section: Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…Information retrieval is a process and a technique that allows information users to find relevant information they need through a dataset, where information is organized in a certain way [1,2,3]. A document retrieval system contains mainly three parts: index generation, query processing, and document retrieval.…”
Section: Introductionmentioning
confidence: 99%