Proceedings of the 19th ACM International Conference on Information and Knowledge Management 2010
DOI: 10.1145/1871437.1871497
|View full text |Cite
|
Sign up to set email alerts
|

Document allocation policies for selective searching of distributed indexes

Abstract: Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. For organizations with modest computational resources the high query processing cost incurred in this exhaustive search setup can be a deterrent to working with large collections. This paper investigates document allocation policies that permit searching only a few shards for each query (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

6
67
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(73 citation statements)
references
References 18 publications
6
67
0
Order By: Relevance
“…However, in this approach one step goes forward, using an ontology-based fuzzy similarity, based on both semantic and structural issues. The centralization used in these techniques [28][29][30][31] for dealing with a distributed framework results in a reduction of the search efficiency. Obviously, gathering information within the distributed system causes a computational overload.…”
Section: B Experimentsmentioning
confidence: 99%
“…However, in this approach one step goes forward, using an ontology-based fuzzy similarity, based on both semantic and structural issues. The centralization used in these techniques [28][29][30][31] for dealing with a distributed framework results in a reduction of the search efficiency. Obviously, gathering information within the distributed system causes a computational overload.…”
Section: B Experimentsmentioning
confidence: 99%
“…Recent research has focused on reducing the search cost per query without hurting overall effectiveness by reordering the documents in each shard by topic or similarity [3]. These systems are able to achieve effectiveness close to a search over the entire collection (exhaustive search) while using only a few shards for each Table 1: The proportion of system instances that demonstrated a significant difference using a paired t-test, and the p values when comparing the sample-based IR algorithm proposed by Kulkarni and Callan [3] at varying CSI sample rates with a deterministic exhaustive search, and with itself (a nondeterministic algorithm) with a CSI sample rate of 4% using the TREC GOV2 dataset and TREC topics 701 -850.…”
Section: Case Studymentioning
confidence: 99%
“…Thus, we have 10 different instances of the sharded index. As with the original experiments [3], 50 shards were formed per instance, and the full dependency model (FDM) is used to rank the queries [4]. Selecting a subset of 5 shards produced equivalent retrieval results at depth 10 to exhaustive search [3].…”
Section: Experimental Testbedmentioning
confidence: 99%
See 1 more Smart Citation
“…e goals of index partitioning algorithms are to distribute documents across nodes based on document similarity, to facilitate the e cient selection of retrieval resources, such that documents relevant to a query are concentrated across a few shards [22]. ere are two main index partitioning strategies [9]:…”
Section: Introductionmentioning
confidence: 99%