2006
DOI: 10.1007/11735106_21
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems

Abstract: Abstract. In-place and merge-based index maintenance are the two main competing strategies for on-line index construction in dynamic information retrieval systems based on inverted lists. Motivated by recent results for both strategies, we investigate possible combinations of in-place and merge-based index maintenance. We present a hybrid approach in which long posting lists are updated in-place, while short lists are updated using a merge strategy. Our experimental results show that this hybrid approach achie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2006
2006
2015
2015

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…More recent works on sequential systems are mainly focused on on-line incremental updates over disk-based, inverted indexes [7,8]. In [7], the authors propose a hybrid indexing technique. The proposed method merges small posting lists with the already existing index, while using posting list reallocation for large posting lists.…”
Section: Related Workmentioning
confidence: 99%
“…More recent works on sequential systems are mainly focused on on-line incremental updates over disk-based, inverted indexes [7,8]. In [7], the authors propose a hybrid indexing technique. The proposed method merges small posting lists with the already existing index, while using posting list reallocation for large posting lists.…”
Section: Related Workmentioning
confidence: 99%
“…It reduces the index building time but may increase the search time of the long terms due to posting retrievals from both the in-place and the merge-based part. Another variation of hybrid methods also exists that categorizes terms into short or long according to the total postings accumulated in the system rather than those only in memory and the merge-based part [4,5]. This variation increases the building time, but keeps the postings of each term in only one of the merge-based and in-place parts.…”
Section: Related Workmentioning
confidence: 99%
“…To the best of our knowledge this is the first time that a method simultaneously combines the above features. Previous methods distributed the infrequent terms randomly across different blocks [25] unless they managed them individually [3,27], only flushed partially the frequent terms from memory to disk [4,5], and obtained limited benefits from block-based storage management because they only considered small blocks of a few kilobytes [3,25].…”
Section: Introductionmentioning
confidence: 99%
“…Earlier this year, we have presented a family of hybrid index maintenance strategies based on this distinction between short and long lists [2]. The basic idea is rather simple: As soon as the posting list for a given term exceeds a certain length (we refer to this as the long list threshold, denoted as T ), is is declared long and moved from the merge-updated part of the on-disk index to the in-placemaintained part.…”
Section: Hybrid Index Maintenancementioning
confidence: 99%
“…Recently, we have pre- sented a family of hybrid strategies, based on a distinction between short and long posting lists [2]. Hybrid strategies maintain short posting lists following a merge-based approach, while long lists are updated in-place.…”
Section: Introductionmentioning
confidence: 99%