Methodologies for distributed information retrieval

Kretser, Owen de; Moffat, Alistair; Shimmin, T.; Zobel, Justin

doi:10.1109/icdcs.1998.679488

Cited by 29 publications

(15 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In principle, particularly in the absence of cooperation, it would be possible for a collection to contain many relevant documents, be highly ranked, but for the collection's ranking mechanism to be unable to find the documents. In practice, in cooperative systems in which information such as term statistics are shared, this is no more likely than in a monolithic system (de Kretser et al, 1998). Thus it is reasonable to compare systems by their ability to find collections with relevant documents, while noting that there remains the issue of how to find documents within collections and combine these results.…”

Section: Introductionmentioning

confidence: 99%

Collection selection for managed distributed document databases

D'Souza

Thom

Zobel

2004

Information Processing & Management

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Collection selection for managed distributed document databases

D'Souza

Thom

Zobel

2004

Information Processing & Management

View full text Add to dashboard Cite

“…A straightforward way of distributing the retrieval task is to allocate each computer, or server, a defined fraction of the documents and then build an index for each local document set (Harman et al, 1991;de Kretser et al, 1998;Cahoon et al, 2000). Each index consists of a complete vocabulary for the documents on that computer and, for each term in the vocabulary, an inverted list recording the documents containing the term and (if phrase querying is to be supported) the positions in each document at which the term occurs.…”

Section: Document-partitioned Indexing and Queryingmentioning

confidence: 99%

A pipelined architecture for distributed text query evaluation

et al. 2006

View full text Add to dashboard Cite

Two principal query-evaluation methodologies have been described for clusterbased implementation of distributed information retrieval systems: document partitioning and term partitioning. In a document-partitioned system, each of the processors hosts a subset of the documents in the collection, and executes every query against its local subcollection. In a term-partitioned system, each of the processors hosts a subset of the inverted lists that make up the index of the collection, and serves them to a central machine as they are required for query evaluation.In this paper we introduce a pipelined query-evaluation methodology, based on a termpartitioned index, in which partially evaluated queries are passed amongst the set of processors that host the query terms. This arrangement retains the disk read benefits of term partitioning, but more effectively shares the computational load. We compare the three methodologies experimentally, and show that term distribution is inefficient and scales poorly. The new pipelined approach offers efficient memory utilization and efficient use of disk accesses, but suffers from problems with load balancing between nodes. Until these problems are resolved, document partitioning remains the preferred method.

show abstract

“…Tmap1 estimates the agents' routing times by adding the costs of the next adjacent set of nodes in the sorted list. Exceeding the upper bound of the time constraint (the threshold ''T end ''), i.e., the impossibility of performing the task at the node within the time window, is not allowed when assigning an adjacent node to the tour (lines 22,23,26). In this manner, the path of the agents can be decided using the sorted list.…”

Section: Planning Algorithmsmentioning

confidence: 99%

“…In this application, information is spread over several nodes, which are commonly geographically separated [22]. Mobile agents migrate to the nodes where the data are located to perform their retrieval tasks there instead of transmitting data across the network.…”

Section: Introductionmentioning

confidence: 99%