Techniques for Efficient Query Expansion

Billerbeck, Bodo; Zobel, Justin

doi:10.1007/978-3-540-30213-1_4

Cited by 27 publications

(27 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The major bottleneck is fetching the full-text top documents after they have been ranked according to the original query, because these documents are usually stored on disk and disk access times are much slower than memory access times. A more efficient approach is proposed in [Billerbeck and Zobel 2004b], making use of short document summaries to be kept in main memory in the form of a set of terms with the highest tf-idf values. During querying, all terms in the summaries that have been ranked against the original query are then used for sourcing expansion terms, thus bypassing disk access altogether and also avoiding the need of parsing the raw documents.…”

Section: Computational Efficiencymentioning

confidence: 99%

A Survey of Automatic Query Expansion in Information Retrieval

2012

View full text Add to dashboard Cite

The relative ineffectiveness of information retrieval systems is largely caused by the inaccuracy with which a query formed by a few keywords models the actual user information need. One well known method to overcome this limitation is automatic query expansion (AQE), whereby the user's original query is augmented by new features with a similar meaning. AQE has a long history in the information retrieval community but it is only in the last years that it has reached a level of scientific and experimental maturity, especially in laboratory settings such as TREC. This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques. The following questions are addressed: Why is query expansion so important to improve search effectiveness? What are the main steps involved in the design and implementation of an AQE component? What approaches to AQE are available and how do they compare? Which issues must still be resolved before AQE becomes a standard component of large operational information retrieval systems (e.g., search engines)?

show abstract

Section: Computational Efficiencymentioning

confidence: 99%

A Survey of Automatic Query Expansion in Information Retrieval

2012

View full text Add to dashboard Cite

show abstract

“…Otherwise, we report Mean Average Precision (MAP) as implemented by the trec_eval program 4 . In calculating the statistical significance between score count samples, we use a paired randomization test, as described by Smucker et al [18].…”

Section: Discussionmentioning

confidence: 99%

“…Research in the area of query expansion includes Billerbeck and Zobel, who conduct several rounds of experiments examining the use of auxiliary data structures for improving the efficiency of query expansion [5,4]. They discovered that using short summaries of documents significantly reduced the time needed to analyze the documents used for feedback, which they considered to be the largest bottleneck during automatic expansion.…”

Section: Related Workmentioning

confidence: 99%

Efficiency optimizations for interpolating subqueries

Cartright

Allan²

2011

Proceedings of the 20th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

A large class of queries can be viewed as linear combinations of smaller subqueries. Additionally, many situations arise when part or all of one subquery has been preprocessed or has cached information, while another subquery requires full processing. This type of query is common, for example, in relevance feedback settings where the original query has been run to produce a set of expansion terms, but the expansion terms still need to be processed. We investigate mechanisms to reduce the time needed to process queries of this nature.We use RM3, a variant of the Relevance Model scoring algorithm, as our instantiation of this arrangement. We examine the different scenarios that can arise when we have access to the internal structure of each subquery. Given this additional information, we investigate methods to utilize this information, reducing processing costs substantially. Depending on the amount of accessibility we have into the subqueries, we can reduce processing costs over 80% without affecting the score of the final results.

show abstract

“…All methods aim to generate additional query terms that are "semantically" or statistically related to the original query terms, often producing queries with more than 50 or 100 terms and appropriately chosen weights. Given the additional uncertainty induced by the expansion terms, such queries are usually considered as disjunctive queries and incur very high execution costs for a DBMS-style query processing [4,5]. The various methods differ in their sources that they exploit for inferring correlated terms: explicit relationships in thesauri, explicit relevance feedback, pseudo relevance feedback, query associations derived from query logs and click streams, summary snippets of web search engine results, extended topic descriptions (available in benchmarks), or combinations of various techniques.…”

Section: Related Work 21 Query Expansionmentioning

confidence: 99%

“…For difficult retrieval tasks like the above, query expansion can improve precision@top-k, recall, as well as uninterpolated mean average precision (MAP) by a significant margin (see, e.g., [20,21]). However, in contrast to a mere benchmark setting such as TREC, applying these techniques in a real application with unpredictable ad-hoc queries (e.g., in digital libraries, intranet search, or web communities) faces three major problems [4,5]: 1) The threshold for selecting expansion terms needs to be carefully hand-tuned, and this is highly dependent on the application's corpus and query workload. 2) An inappropriate choice of the sensitive expansion threshold may result in either not achieving the desired improvement in recall (if the threshold is set too conservatively) or in high danger of topic dilution (if the query is expanded too aggressively).…”

Section: Introductionmentioning

confidence: 99%

Efficient and self-tuning incremental query expansion for top-k query processing

Theobald

Schenkel

Weikum

2005

Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion terms whose thematic similarity to the original query terms is above some specified threshold, thus generating a disjunctive query with much higher dimensionality. This poses three major problems: 1) the need for hand-tuning the expansion threshold, 2) the potential topic dilution with overly aggressive expansion, and 3) the drastically increased execution cost of a high-dimensional query. The method developed in this paper addresses all three problems by dynamically and incrementally merging the inverted lists for the potential expansion terms with the lists for the original query terms. A priority queue is used for maintaining result candidates, the pruning of candidates is based on Fagin's family of top-k algorithms, and optionally probabilistic estimators of candidate scores can be used for additional pruning. Experiments on the TREC collections for the 2004 Robust and Terabyte tracks demonstrate the increased efficiency, effectiveness, and scalability of our approach.

show abstract

Techniques for Efficient Query Expansion

Cited by 27 publications

References 14 publications

A Survey of Automatic Query Expansion in Information Retrieval

A Survey of Automatic Query Expansion in Information Retrieval

Efficiency optimizations for interpolating subqueries

Efficient and self-tuning incremental query expansion for top-k query processing

Contact Info

Product

Resources

About