Proceedings of the 1st International Conference on Scalable Information Systems - InfoScale '06 2006
DOI: 10.1145/1146847.1146881
|View full text |Cite
|
Sign up to set email alerts
|

Query-driven document partitioning and collection selection

Abstract: Abstract-We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The docume… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 41 publications
(46 citation statements)
references
References 41 publications
0
46
0
Order By: Relevance
“…Puppin et al [15] used query logs to organize document collection into multiple shards. The query log covered a period of time when exhaustive search was used for each query.…”
Section: Document Allocationmentioning
confidence: 99%
“…Puppin et al [15] used query logs to organize document collection into multiple shards. The query log covered a period of time when exhaustive search was used for each query.…”
Section: Document Allocationmentioning
confidence: 99%
“…Various approximations of relevance have been studied in P2PIR: assuming documents containing all query keywords to be relevant [2], using "approximate descriptions of relevant material" [1] or comparing results of distributed algorithms to results of a centralised system [6,4,9,8].…”
Section: Related Workmentioning
confidence: 99%
“…those with score > 0) relevant [6] -resulting in what is sometimes called relative recall (RR) -or just the N most highly ranked documents [4,9,8]. In the latter case, precision at k documents is used as an evaluation measurewe will call it P N @k in the rest of this work, denoting its dependence on N .…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations