Proceedings of the 2nd International ICST Conference on Scalable Information Systems 2007
DOI: 10.4108/infoscale.2007.881
|View full text |Cite
|
Sign up to set email alerts
|

Query-Driven Indexing for Scalable Peer-to-Peer Text Retrieval

Abstract: We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been identified as the major problem for the standard P2P approach with single term indexing, we leverage a distributed index that stores up to top-k document references only for carefully chosen indexing term combinations. In addition, since the number of possible term combinations extracted from a document collection can be very large, we p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2009
2009
2012
2012

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 0 publications
0
15
0
Order By: Relevance
“…However, PCIR can also efficiently be used to optimize document-granularity indexing [4] as well as query-driven indexing [13,14], as we show in Section 6.…”
Section: Dht-based Query Processingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, PCIR can also efficiently be used to optimize document-granularity indexing [4] as well as query-driven indexing [13,14], as we show in Section 6.…”
Section: Dht-based Query Processingmentioning
confidence: 99%
“…For example, in a recent version of ALVIS [14], peers identify frequent multi-term queries, and cache their results in the DHT. This offers faster query execution with lower network overhead, albeit at the expense of a larger inverted index over the DHT.…”
Section: P2p Information Retrievalmentioning
confidence: 99%
“…This offers a solution to one of the main drawbacks of using distributed hash tables: intersection of large posting lists. Cuenca-Acuna et al, 2003;Suel et al, 2003;Tang and Dwarkadas, 2004;Balke et al, 2005;Michel et al, 2005a;Zhang and Suel, 2005;Skobeltsyn and Aberer, 2006;Skobeltsyn et al, 2007aSkobeltsyn et al, , 2009 Processing only a subset of items during the search process can yield performance benefits: less data processing and lower latency. Various algorithms, discussed shortly, can be used to retrieve the top items for a particular query without having to calculate the scores for all the items.…”
Section: Approximate Intersection Of Posting Lists With Bloom Filtersmentioning
confidence: 99%
“…Hence, the topology of the network is not determined by a key space. (Reynolds and Vahdat, 2003;Skobeltsyn and Aberer, 2006;Skobeltsyn et al, 2007a;Zimmer et al, 2008;Skobeltsyn et al, 2007bSkobeltsyn et al, , 2009 It makes little sense to reconstruct the search result set for the same query over and over again if it does not really change. Performance can be increased significantly by caching search results.…”
Section: Involving Fewer Peers During Index Look-ups By Global Replicmentioning
confidence: 99%
See 1 more Smart Citation