Improving the Performance of Pipelined Query Processing with Skipping

Jonassen, Simon; Bratsberg, Svein Erik

doi:10.1007/978-3-642-35063-4_1

Cited by 6 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The access cost is negligible compared to the cost of posting list processing. The broker node stores a memory-based array of maximum scores which contributes with another 200MB of data, but according to our observations in an earlier work [Jonassen and Bratsberg 2012a], this significantly improves the performance.…”

Section: Methodsmentioning

confidence: 85%

“…The query processing framework is implemented in Java. 4 In the implementation, we use the Okapi BM25 scoring model with skipping and MaxScore optimizations [Jonassen and Bratsberg 2012a].…”

Section: Methodsmentioning

confidence: 99%

“…The pipelined query processing scheme is originally proposed by Moffat et al [2007] and is later enhanced by Jonassen and Bratsberg [2010, 2012a, 2012b. This scheme solves the bottleneck problem at the central broker and can also improve the computational load balance of the system.…”

Section: Pipelined Query Processing Schemementioning

confidence: 99%

“…The proposed model simply aims to minimize the average hitting set size of queries, trying to gather coaccessed query terms on the same index servers. This approach indirectly reduces the communication volume while increasing the computational benefits due to query processing optimizations (e.g., skipping and MaxScore [Jonassen and Bratsberg 2012a]). The reason our model cannot directly capture the communication volume is because this information becomes available during the actual query evaluation, that is, it is not available during the model construction phase.…”

Section: Modelmentioning

confidence: 99%

See 3 more Smart Citations

A term-based inverted index partitioning model for efficient distributed query processing

et al. 2013

Self Cite

View full text Add to dashboard Cite

In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the index is either document-based or term-based partitioned. This choice is made depending on the properties of the underlying hardware infrastructure, query traffic distribution, and some performance and availability constraints. In query processing on retrieval systems that adopt a term-based index partitioning strategy, the high communication overhead due to the transfer of large amounts of data from the index servers forms a major performance bottleneck, deteriorating the scalability of the entire distributed retrieval system. In this work, to alleviate this problem, we propose a novel inverted index partitioning model that relies on hypergraph partitioning. In the proposed model, concurrently accessed index entries are assigned to the same index servers, based on the inverted index access patterns extracted from the past query logs. The model aims tominimize the communication overhead that will be incurred by future queries while maintaining the computational load balance among the index servers. We evaluate the performance of the proposed model through extensive experiments using a real-life text collection and a search query sample. Our results show that considerable performance gains can be achieved relative to the term-based index partitioning strategies previously proposed in literature. In most cases, however, the performance remains inferior to that attained by document-based partitioning. © 2013 ACM

show abstract

Section: Methodsmentioning

confidence: 85%

Section: Methodsmentioning

confidence: 99%

Section: Pipelined Query Processing Schemementioning

confidence: 99%

Section: Modelmentioning

confidence: 99%

See 2 more Smart Citations

A term-based inverted index partitioning model for efficient distributed query processing

et al. 2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…Documents can be retrieved and ranked by matching the query vector versus the document vector to compute the score or similarity. The retrieved documents are ranked according to the similarity to the user query [33][34][35][36].…”

Section: Matching and Rankingmentioning

confidence: 99%

Query Expansion for Arabic Information Retrieval Model: Performance Analysis and Modification

Elnahaas

El-Fishawy

Elsayed

et al. 2018

The Egyptian Journal of Language Engineering

View full text Add to dashboard Cite

Information retrieval aims to find all relevant documents responding to a query from textual data. A good information retrieval system should retrieve only those documents that satisfy the user query. Although several models were developed, most of Arabic information retrieval models do not satisfy the user needs. This is because the Arabic language is more powerful and has complex morphology as well as high polysemy. This paper first investigates the most recent Arabic information retrieval model and then presents two different approaches to enhance the effectiveness of the adopted model. The main idea of the proposed approaches is to modify and/or expand the user query. The first approach expands user query by using semantics of words according to an Arabic dictionary. The second approach modifies and/or expands user query by adding some useful information from the pseudo relevance feedback. In other words, the query is modified by selecting relevant textual keywords for expanding the query and weeding out the non-related textual words. The adopted retrieval model and the two proposed approaches are implemented, tested, compared, and evaluated considering Arabic document collection. The obtained results show that the proposed approaches enhance the effectiveness of the Arabic information retrieval model by about 15% to 35%.

show abstract

Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

Jonassen

Bratsberg

2013

World Wide Web

Self Cite

View full text Add to dashboard Cite

Abstract. Web search engines need to provide high throughput and short query latency. Recent results show that pipelined query processing over a term-wise partitioned inverted index may have superior throughput. However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method. In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing. Further, we introduce a novel idea of using Max-Score pruning within pipelined query processing and a new term assignment heuristic, partitioning by Max-Score. Our current results indicate a significant improvement over the state-of-the-art approach and lead to several further optimizations, which include dynamic load balancing, intra-query concurrent processing and a hybrid combination between pipelined and non-pipelined execution.

show abstract

Improving the Performance of Pipelined Query Processing with Skipping

Cited by 6 publications

References 12 publications

A term-based inverted index partitioning model for efficient distributed query processing

A term-based inverted index partitioning model for efficient distributed query processing

Query Expansion for Arabic Information Retrieval Model: Performance Analysis and Modification

Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

Contact Info

Product

Resources

About