2004
DOI: 10.1007/978-3-540-30213-1_4
|View full text |Cite
|
Sign up to set email alerts
|

Techniques for Efficient Query Expansion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
26
0

Year Published

2005
2005
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(27 citation statements)
references
References 14 publications
1
26
0
Order By: Relevance
“…The major bottleneck is fetching the full-text top documents after they have been ranked according to the original query, because these documents are usually stored on disk and disk access times are much slower than memory access times. A more efficient approach is proposed in [Billerbeck and Zobel 2004b], making use of short document summaries to be kept in main memory in the form of a set of terms with the highest tf-idf values. During querying, all terms in the summaries that have been ranked against the original query are then used for sourcing expansion terms, thus bypassing disk access altogether and also avoiding the need of parsing the raw documents.…”
Section: Computational Efficiencymentioning
confidence: 99%
“…The major bottleneck is fetching the full-text top documents after they have been ranked according to the original query, because these documents are usually stored on disk and disk access times are much slower than memory access times. A more efficient approach is proposed in [Billerbeck and Zobel 2004b], making use of short document summaries to be kept in main memory in the form of a set of terms with the highest tf-idf values. During querying, all terms in the summaries that have been ranked against the original query are then used for sourcing expansion terms, thus bypassing disk access altogether and also avoiding the need of parsing the raw documents.…”
Section: Computational Efficiencymentioning
confidence: 99%
“…Otherwise, we report Mean Average Precision (MAP) as implemented by the trec_eval program 4 . In calculating the statistical significance between score count samples, we use a paired randomization test, as described by Smucker et al [18].…”
Section: Discussionmentioning
confidence: 99%
“…Research in the area of query expansion includes Billerbeck and Zobel, who conduct several rounds of experiments examining the use of auxiliary data structures for improving the efficiency of query expansion [5,4]. They discovered that using short summaries of documents significantly reduced the time needed to analyze the documents used for feedback, which they considered to be the largest bottleneck during automatic expansion.…”
Section: Related Workmentioning
confidence: 99%
“…All methods aim to generate additional query terms that are "semantically" or statistically related to the original query terms, often producing queries with more than 50 or 100 terms and appropriately chosen weights. Given the additional uncertainty induced by the expansion terms, such queries are usually considered as disjunctive queries and incur very high execution costs for a DBMS-style query processing [4,5]. The various methods differ in their sources that they exploit for inferring correlated terms: explicit relationships in thesauri, explicit relevance feedback, pseudo relevance feedback, query associations derived from query logs and click streams, summary snippets of web search engine results, extended topic descriptions (available in benchmarks), or combinations of various techniques.…”
Section: Related Work 21 Query Expansionmentioning
confidence: 99%
“…For difficult retrieval tasks like the above, query expansion can improve precision@top-k, recall, as well as uninterpolated mean average precision (MAP) by a significant margin (see, e.g., [20,21]). However, in contrast to a mere benchmark setting such as TREC, applying these techniques in a real application with unpredictable ad-hoc queries (e.g., in digital libraries, intranet search, or web communities) faces three major problems [4,5]: 1) The threshold for selecting expansion terms needs to be carefully hand-tuned, and this is highly dependent on the application's corpus and query workload. 2) An inappropriate choice of the sensitive expansion threshold may result in either not achieving the desired improvement in recall (if the threshold is set too conservatively) or in high danger of topic dilution (if the query is expanded too aggressively).…”
Section: Introductionmentioning
confidence: 99%