Proceedings of the 19th ACM International Conference on Information and Knowledge Management 2010
DOI: 10.1145/1871437.1871594
|View full text |Cite
|
Sign up to set email alerts
|

Improved index compression techniques for versioned document collections

Abstract: Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are usually kept in compressed form, and many techniques for optimizing compressed size and query processing speed have been proposed. In this paper, we focus on versioned document collections, that is, collections where each document is modified over time, resulting in multiple versions of the document. Consecutive versions of the same do… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
48
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 23 publications
(49 citation statements)
references
References 32 publications
1
48
0
Order By: Relevance
“…Our experiments also show that other classical encodings, such as Simple9 [1] and PforDelta [26], perform surprisingly well on repetitive collections, yet they still require 5 times more space than ours. Our techniques still do not match the performance of He et al's methods [12] when their assumptions hold, but these methods are not universal.…”
Section: Introductionmentioning
confidence: 66%
See 4 more Smart Citations
“…Our experiments also show that other classical encodings, such as Simple9 [1] and PforDelta [26], perform surprisingly well on repetitive collections, yet they still require 5 times more space than ours. Our techniques still do not match the performance of He et al's methods [12] when their assumptions hold, but these methods are not universal.…”
Section: Introductionmentioning
confidence: 66%
“…Given a parameter B, it samples the universe of size u at intervals 2 ⌈log 2 (uB/ℓ)⌉ . In the particular case of highly repetitive collections, the best figures so far have been presented by He et al [12] in the non-positional case. They model versioned document collections using so-called two-level indexes.…”
Section: Data Structures For Inverted Listsmentioning
confidence: 96%
See 3 more Smart Citations