Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 2013
DOI: 10.1145/2484028.2484079
|View full text |Cite
|
Sign up to set email alerts
|

Document identifier reassignment and run-length-compressed inverted indexes for improved search performance

Abstract: Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: the inverted indexes. Currently, inverted indexes can be represented very efficiently using index compression schemes. Recent investigations also study how an optimized document ordering can be used to assign document identifiers (docIDs) to the document database. This yields important improvements in index compression and query processing time. In this paper we follow this line of research, yet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…Thus they sacrifice space in order to improve runtime performance, while the remaining types of ordering exploit information from the dataset to improve both space and runtime performance. Matrix: Reordering by manipulating the document vs. term matrix can produce improvements in space by grouping documents with high frequency terms [21], producing a block diagonal matrix [5], or creating run-length encodable portions of the matrix [3]. Manipulating the matrix for large datasets can be expensive, and merging subindexes can be di cult, so these techniques have not been widely used.…”
Section: Reorderingmentioning
confidence: 99%
“…Thus they sacrifice space in order to improve runtime performance, while the remaining types of ordering exploit information from the dataset to improve both space and runtime performance. Matrix: Reordering by manipulating the document vs. term matrix can produce improvements in space by grouping documents with high frequency terms [21], producing a block diagonal matrix [5], or creating run-length encodable portions of the matrix [3]. Manipulating the matrix for large datasets can be expensive, and merging subindexes can be di cult, so these techniques have not been widely used.…”
Section: Reorderingmentioning
confidence: 99%
“…Hence, integer compression schemes are typically applied (e.g., Williams and Zobel [1999] and Trotman [2003]). Compression can also be improved by optimizing the ordering of the postings and storing the differences in consecutive document identifiers instead of the actual DocIDs (e.g., Yan et al [2009] and Arroyuelo et al [2013]). …”
Section: Indexing Methodsmentioning
confidence: 99%
“…The faster transfer can outweigh the time required for decompression, thus speeding up retrieval overall and increasing the query throughput. Seeking for more efficient compression techniques for the inverted index continues to be an active research area (e.g., Trotman [2003], Arroyuelo et al [2013], and Konow et al [2013]). With compression, the inverted index is shown to be superior, both in terms of storage cost and the time needed to respond to typical queries [Witten et al 1999], compared with other indexing methods such as suffix arrays [Manber and Myers 1993] and signature files Christodoulakis 1984, 1987;Croft and Savino 1988;Wong and Lee 1990;Lee et al 1995].…”
Section: Introductionmentioning
confidence: 99%