Efficient query evaluation using a two-level retrieval process

Bröder, Arndt; Carmel, David; Herscovici, Michael; Soffer, Aya; Zien, Jason Y.

doi:10.1145/956863.956944

Cited by 319 publications

(330 citation statements)

References 19 publications

Supporting

Mentioning

324

Contrasting

Order By: Relevance

“…In order to get the successor of 49, firstly we find the node which has the longest common prefix of 00110001. We run binary search over the 8 hash tables and get 001 in Table [6]. After that, we can find the successor via its right pointer which pointed to the successor node at the leaf level.…”

Section: Data Structurementioning

confidence: 99%

A Set Intersection Algorithm Via x-Fast Trie

Ye¹

2016

JCP

View full text Add to dashboard Cite

This paper proposes a simple intersection algorithm for two sorted integer sequences . Our algorithm is designed based on x-fast trie since it provides efficient find and successor operators. We present that our algorithm outperforms skip list based algorithm when one of the sets to be intersected is relatively 'dense' while the other one is (relatively) 'sparse'. Finally, we propose some possible approaches which may optimize our algorithm further.

show abstract

Section: Data Structurementioning

confidence: 99%

A Set Intersection Algorithm Via x-Fast Trie

Ye¹

2016

JCP

View full text Add to dashboard Cite

show abstract

“…However, this also means that an entire block must be decompressed even when just a single posting is required from it (e.g. for partial scoring approaches such as WAND [15]). Moreover, it is possible to obtain a larger output than the input when there are not enough integers to compress, because extra space is required in the output to store information needed at decompression time.…”

Section: List-adaptive Codecsmentioning

confidence: 99%

“…For instance, at the matching layer, dynamic pruning techniques such as WAND [15] enhance efficiency by omitting the scoring of documents that cannot reach the final retrieved set. In the top-most re-ranking layer, Cambazoglu et al [16] showed how learning to rank models could be simplified to enhance their efficiency.…”

Section: Introductionmentioning

confidence: 99%

On Inverted Index Compression for Search Engine Efficiency

Catena

Macdonald

Ounis

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users' queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we experiment with different modern integer compression algorithms, integrating these into a modern IR system. Through comprehensive experiments conducted on two large, widely used document corpora and large query sets, our results show the benefit of compression for different types of posting information to the space-and time-efficiency of the search engine. Overall, we find that the simple Frame of Reference compression scheme results in the best query response times for all types of posting information. Moreover, we observe that the frequency and position posting information in Web corpora that have large volumes of anchor text are more challenging to compress, yet compression is beneficial in reducing average query response times.

show abstract

“…Despite various attempts to displace inverted indexes from their dominant position for document ranking tasks over the years, no alternative has been able to consistently produce the same level of efficiency, effectiveness, and time / space trade-offs that inverted indexes can provide (see, for instance Zobel et al [45]). Ranked document retrieval requires that only the top-k documents are returned, and, as a result, researchers have proposed many heuristic approaches to improve top-k efficiency [1,4,5,6,32,38]. These approaches can be classified in two general categories: term-at-a-time (TAAT) and document-at-a-time (DAAT).…”

Section: Inverted Indexesmentioning

confidence: 99%

“…As the minimum bounding score in the heap slowly increases, more and more postings can be omitted. Enhanced DAAT pruning strategies similar in spirit to MAXSCORE have been shown to further increase efficiency [4,38]. Turtle and Flood also describe a similar approach to improve the efficiency of TAAT strategies.…”

Section: Document-at-a-time Processing (Daat)mentioning

confidence: 99%

Efficient in-memory top-k document retrieval

Culpepper

Petri

Scholer

2012

Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

For over forty years the dominant data structure for ranked document retrieval has been the inverted index. Inverted indexes are effective for a variety of document retrieval tasks, and particularly efficient for large data collection scenarios that require disk access and storage. However, many efficiency-bound search tasks can now easily be supported entirely in-memory as a result of recent hardware advances.In this paper we present a hybrid algorithmic framework for inmemory bag-of-words ranked document retrieval using a self-index derived from the FM-Index, wavelet tree, and the compressed suffix tree data structures, and evaluate the various algorithmic trade-offs for performing efficient queries entirely in-memory. We compare our approach with two classic approaches to bag-of-words queries using inverted indexes, term-at-a-time (TAAT) and document-at-atime (DAAT) query processing. We show that our framework is competitive with state-of-the-art indexing structures, and describe new capabilities provided by our algorithms that can be leveraged by future systems to improve effectiveness and efficiency for a variety of fundamental search operations.

show abstract

Efficient query evaluation using a two-level retrieval process

Cited by 319 publications

References 19 publications

A Set Intersection Algorithm Via x-Fast Trie

A Set Intersection Algorithm Via x-Fast Trie

On Inverted Index Compression for Search Engine Efficiency

Efficient in-memory top-k document retrieval

Contact Info

Product

Resources

About