Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2000
DOI: 10.1145/345508.345591
|View full text |Cite
|
Sign up to set email alerts
|

Partial collection replication versus caching for information retrieval systems

Abstract: The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2004
2004
2016
2016

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 23 publications
0
13
0
Order By: Relevance
“…Several articles [2], [5], [12] analyze the performance of a distributed IR system using collections of different sizes and different system architectures. Cahoon and McKinley in [3] describe the result of simulated experiments on the distributed INQUERY architecture.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Several articles [2], [5], [12] analyze the performance of a distributed IR system using collections of different sizes and different system architectures. Cahoon and McKinley in [3] describe the result of simulated experiments on the distributed INQUERY architecture.…”
Section: Related Workmentioning
confidence: 99%
“…This is due to the round robin distribution policy used in the brokers, as it can lead to some small periods of inactivity at certain replicas. In future works, some other distribution policies can be analysed in order to improve the throughput up to the optimal theoretical value, similar to the one used in [12].…”
Section: Replicated Systemmentioning
confidence: 99%
“…Our approach is that beacons remember the results of previous user queries, and use these results to guide future queries. Unlike previous caching schemes (such as [27]), the InfoBeacons cache is not used to answer queries but instead to direct queries to the sources themselves. We introduce a function, called ProbResults, that ranks sources for a given query based on past results stored in the beacon's cache.…”
Section: Introductionmentioning
confidence: 99%
“…The base sub-collection of 8.5 million documents has been distributed over N query servers using a switched network and three brokers, where N = 1, 2,4,8,16,32,64,128,256 and 512. In Table 1, the column Configuration describes the query servers assigned to each topic.…”
Section: Experimental Settingmentioning
confidence: 99%
“…Frieder and Siegelmann [9] studied the organisation of the data to improve the performance of parallel IR systems using multiprocessor computers. Lu and McKinley [16] analysed the effects of partial replication to improve the performance in a collection of 1TB. Moffat, Webber, Zobel and BaezaYates [18] presented a replication technique for a pipelined term distributed system, which significantly improves the throughput over a basic term distributed system.…”
Section: Introductionmentioning
confidence: 99%