Proceedings of the 21st ACM International Conference on Information and Knowledge Management 2012
DOI: 10.1145/2396761.2396833
|View full text |Cite
|
Sign up to set email alerts
|

Shard ranking and cutoff estimation for topically partitioned collections

Abstract: Large document collections can be partitioned into topical shards to facilitate distributed search [19]. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems. As a basis we emplo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
47
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(47 citation statements)
references
References 35 publications
0
47
0
Order By: Relevance
“…There are several other resource allocation techniques available but covering all of them is beyond the scope of this paper. See for example Aly et al (2013), Callan, Lu, and Croft (1995), Kulkarni, Tigelaar, Hiemstra, and Callan (2012), and Xu and Callan (1998). Here we focus on resource allocation schemes which depend on sampling, such as ReDDE and Rank-S. We refer to any sample-based resource allocation scheme as a central sample index (CSI) in the remainder of this paper.…”
Section: Case Studymentioning
confidence: 99%
“…There are several other resource allocation techniques available but covering all of them is beyond the scope of this paper. See for example Aly et al (2013), Callan, Lu, and Croft (1995), Kulkarni, Tigelaar, Hiemstra, and Callan (2012), and Xu and Callan (1998). Here we focus on resource allocation schemes which depend on sampling, such as ReDDE and Rank-S. We refer to any sample-based resource allocation scheme as a central sample index (CSI) in the remainder of this paper.…”
Section: Case Studymentioning
confidence: 99%
“…Distributed information retrieval (IR) is of substantial interest to the IR research community (e.g., [2][3][4]6], [8], [9], [23], [24], and [26][27][28]). Although traditionally focused on efficiency within a search site, such as by partitioning a document collection into smaller collections or "shards" (e.g., [26][27][28]), recent research has demonstrated the feasibility of distributed IR at the system level (e.g., [2], [4], [8], [9], [23], and [24]).…”
Section: Related Effortsmentioning
confidence: 99%
“…Although traditionally focused on efficiency within a search site, such as by partitioning a document collection into smaller collections or "shards" (e.g., [26][27][28]), recent research has demonstrated the feasibility of distributed IR at the system level (e.g., [2], [4], [8], [9], [23], and [24]). …”
Section: Related Effortsmentioning
confidence: 99%
See 1 more Smart Citation
“…Kulkarni, Tigelaar, Hiemstra, and Callan (2012) proposed three algorithms in the context of topic-based partitions to decide which collection and how many partitions to select. It used the so called central sample index (CSI) as an inverted index of a small sample of randomly selected documents from each partition.…”
Section: Introductionmentioning
confidence: 99%