2013
DOI: 10.1145/2516633.2516637
|View full text |Cite
|
Sign up to set email alerts
|

A term-based inverted index partitioning model for efficient distributed query processing

Abstract: In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the index is either document-based or term-based partitioned. This choice is made depending on the properties of the underlying hardware infrastructure, query traffic distribution, and some performance and availability constraints. In query processing on retrieval systems that adopt a term-based index partitioning strategy, the high communication… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(20 citation statements)
references
References 30 publications
0
20
0
Order By: Relevance
“…The difference is that, in our case, we need to send a smaller amount of data among processors, versus the partial results (in some cases, whole inverted lists) that must be sent in inverted indexes. Also, while load imbalance is a problem in the pipelined strategy [12,60], Multiplexed should be less sensitive to the query bias. The first stage of Multiplexed query processing is fully balanced because any processor can carry out the binary search (indeed, the same query can be started by a different processor each time it is raised).…”
Section: Global Multiplexed Suffix Arraymentioning
confidence: 99%
“…The difference is that, in our case, we need to send a smaller amount of data among processors, versus the partial results (in some cases, whole inverted lists) that must be sent in inverted indexes. Also, while load imbalance is a problem in the pipelined strategy [12,60], Multiplexed should be less sensitive to the query bias. The first stage of Multiplexed query processing is fully balanced because any processor can carry out the binary search (indeed, the same query can be started by a different processor each time it is raised).…”
Section: Global Multiplexed Suffix Arraymentioning
confidence: 99%
“…Herein, we present several skipping optimizations and a new term assignment strategy. In contrast to the previously presented assignment optimizations [2,8,15], our strategy does not try to assign co-occurring terms to the same node or to do load balancing, but rather to maximize the pruning efficiency. Additionally, it opens a possibility for dynamic load balancing with low repartitioning overhead and hybrid query processing.…”
Section: Related Workmentioning
confidence: 99%
“…In order to show the validity of the algorithms proposed in our paper, we investigate undirectional HP models proposed for index partitioning of parallel IR systems [8,28], where replication is beneficial and commonly used [37]. Although we address the HP models used in parallel IR, our replication scheme can be used for any domain in which the underlying problem can be modeled as an undirected hypergraph.…”
Section: Applicationmentioning
confidence: 99%
“…In this HP model, the nets have unit costs due to the infinite result cache capacity assumption. 1 The weight of a vertex is set equal to either the number of postings in the inverted list of the term represented by that vertex [8] or the multiplication of term popularity and the corresponding posting list size [37]. The balance constraint in the former vertex weighting scheme corresponds to maintaining storage balance, whereas the balance constraint in the latter vertex weighting scheme corresponds to maintaining computational workload balance.…”
Section: Applicationmentioning
confidence: 99%
See 1 more Smart Citation