Dissemination of collection wide information in a distributed information retrieval system

Viles, Charles L.; French, James C.

doi:10.1145/215206.215327

Cited by 45 publications

(33 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This finding appears to contradict the findings of Viles and French [19,9]; however, there are a number of differences between the experiments reported here and the experiments performed in that work. A discussion of those experimental differences and an analysis of the implications of our results appear in Section 6.…”

Section: Comparing Dist-cwi and Dist-licontrasting

confidence: 99%

“…An issue that deserves immediate attention is the apparent contradiction of this work and the work of Viles and French [9,19]. Based on the work of Viles and French, we expected that Hypothesis 3 would be true (i.e., the use of CWI would improve distributed retrieval performance); however, the dist-LI results were significantly better than the dist-CWI results.…”

Section: Cwi and Merging Analysismentioning

confidence: 83%

“…Xu and Callan [22] showed that poor database selection performance hindered distributed retrieval performance, and investigated the use of query expansion and phrases in database selection. Viles and French [9,19] showed that dissemination of collection information increased retrieval effectiveness. Xu and Croft [23] explored cluster-based language models, investigating different ways to construct database selection indexes.…”

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

“…We also consider the combination of both database selection and the dissemination of collectionwide information. Viles and French [19,9] studied the use of CWI in a distributed environment in which database selection was not used, while Xu and Croft used CWI for all experiments reported in [23].…”

Section: Experimental Methodologymentioning

confidence: 99%

See 3 more Smart Citations

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

101

View full text Add to dashboard Cite

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.

show abstract

Section: Comparing Dist-cwi and Dist-licontrasting

confidence: 99%

Section: Cwi and Merging Analysismentioning

confidence: 83%

Section: Distributed Retrieval Database Selection and Results Mergingmentioning

confidence: 99%

Section: Experimental Methodologymentioning

confidence: 99%

See 2 more Smart Citations

The impact of database selection on distributed searching

Powell

French

Callan

et al. 2000

Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

101

View full text Add to dashboard Cite

show abstract

“…Others have also created resource selection testbeds by dividing the TREC data into multiple databases, usually also partitioning the data along source and publication date criteria, for example (Voorhees et al, 1995bViles and French, 1995Hawking and Thistlewaite, 1999French et al, 1998. Indeed, there are few widely available alternative sources of data for creating resource selection testbeds.…”

Section: Multi-database Testbedsmentioning

confidence: 99%

Distributed Information Retrieval

Callan

The Information Retrieval Series

228

328

View full text Add to dashboard Cite

A m ulti-database model of distributed information retrieval is presented, in which people are assumed to have access to many searchable text databases. In such a n e n vironment, full-text information retrieval consists of discovering database contents, ranking databases by their expected ability to satisfy the query, s e a r c hing a small number of databases, and merging results returned by di erent databases. This paper presents algorithms for each task. It also discusses how to reorganize conventional test collections into multi-database testbeds, and evaluation methodologies for multi-database experiments. A broad and diverse group of experimental results is presented to demonstrate that the algorithms are e ective, e cient, robust, and scalable.

show abstract

The FedLemur project: Federated search in the real world

Avrahami

Yau

et al. 2005

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

Federated search and distributed information retrieval systems provide a single user interface for searching multiple full-text search engines. They have been an active area of research for more than a decade, but in spite of their success as a research topic, they are still rare in operational environments. This article discusses a prototype federated search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting research solutions to this operational environment. A series of experiments explore how well prior research results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about search engine quality in the "real world." IntroductionThe FedStats Web 1 site is a portal that provides "one-stop shopping" to statistical information published by more than 100 federal agencies so that citizens, businesses, and government employees can find what they need without knowing where it is stored or which agency publishes it. Topicspecific Web portals such as FedStats have become a crucial component of Web search in recent years because the proliferation of Web sites and search engines can make it difficult for people to know where to search for needed information. General-purpose search engines, such as Google 2 and AltaVista, 3 can be helpful, but their generality is sometimes more of an obstacle than an aid. For example, submitting the query "unemployment statistics" to Google returns a mix of federal, state, and foreign government information in the top 10 documents. Restricting the search to the ".gov" domain effects only a small improvement. The same query at the FedStats Web site returns information from 12 federal government agencies.Portals such as FedStats are usually based on one of two software architectures. The most common approach is to download documents from otherWeb sites, integrate them into a single large text database, and index it with a single search engine. General-purpose search engines, such as Google, use this approach; we call it the single-database approach in this report. The second approach is to link the search engines at each Web site into a federated search system. This approach is used within some large commercial search services (e.g., Federated Search (Distributed Information Retrieval)Federated search systems 5 provide a single-user interface to multiple search engines. The person using the federated search system may know (probably knows) that the

show abstract

Dissemination of collection wide information in a distributed information retrieval system

Cited by 45 publications

References 12 publications

The impact of database selection on distributed searching

The impact of database selection on distributed searching

Distributed Information Retrieval

The FedLemur project: Federated search in the real world

Contact Info

Product

Resources

About