Proceedings of the 18th ACM Conference on Information and Knowledge Management 2009
DOI: 10.1145/1645953.1646010
|View full text |Cite
|
Sign up to set email alerts
|

On-line index maintenance using horizontal partitioning

Abstract: In this paper, we propose a new merge-based index maintenance strategy for Information Retrieval systems. The new model is based on partitioning of the inverted index across the terms in it. We exploit the query log to partition the on-disk inverted index into two types of sub-indexes. Inverted lists of the terms contained in the queries that are frequently posed to the Information Retrieval systems are kept in one partition, called frequent-term index and the other inverted lists form another partition, calle… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…The accumulated postings are routinely combined with the rest of the data in a hierarchy based on geometric partitioning [8]. More advanced techniques exist [7,9], but they would not change the tradeoffs that we show. We use PForDelta [15] for compression, which has shown efficient decompression performance in recent studies [13].…”
Section: Methodsmentioning
confidence: 93%
“…The accumulated postings are routinely combined with the rest of the data in a hierarchy based on geometric partitioning [8]. More advanced techniques exist [7,9], but they would not change the tradeoffs that we show. We use PForDelta [15] for compression, which has shown efficient decompression performance in recent studies [13].…”
Section: Methodsmentioning
confidence: 93%
“…For index update, Margaritis and Anastasiadis (2009) only flush selectively the terms with most posting lists in memory into disk to merge it with primary index when the memory gets full with new posting lists. Gurajada and Kumar (2009) propose a new merge-based index maintenance strategy for information retrieval systems. This strategy partitions the index into frequent-term index and infrequent-term index based on the frequency of terms, and uses a lazy-merge strategy for maintaining infrequent-term index and an active merge strategy for maintaining frequent-term index.…”
Section: Related Workmentioning
confidence: 99%
“…Each query event includes the fields id, query terms, and time. Although controversial, the query log is publicly-available and used widely (e.g., [20], [22], [10], [15], [19], and [1]), which simplifies comparisons to past and future results. Existing research has also investigated how to effectively anonymize such query logs (e.g., [21], [11], and [14]).…”
Section: Query Logmentioning
confidence: 99%