Beyond bag‐of‐words: Bigram‐enhanced context‐dependent term weights

Dang, Edward Kai Fung; Luk, Robert W. P.; Allan, James

doi:10.1002/asi.23024

Cited by 10 publications

(22 citation statements)

References 39 publications

(74 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Subsequently, Dang, Luk, and Allan () extended the work to include n ‐grams in the B&D procedure. In particular, they showed that including bigrams ( n = 2) could improve retrieval performance over unigram B&D, while larger values of n did not lead to further improvement.…”

Section: Model Formulationsmentioning

confidence: 99%

“…In Dang et al. (), a bigram is defined as an ordered pair of words in a document after stop‐word removal. An additional requirement is that the adjacent members ( t i , t i+1 ) should not be separated by any punctuation, excluding hyphens, in the document.…”

Section: Model Formulationsmentioning

confidence: 99%

“…Dang et al. () identified two ingredients in their bigram B&D method that were needed to attain robust performance improvement over using unigrams only. First, they found that in calculating the idf weighting for bigrams, it was necessary to use a “local” document frequency.…”

Section: Model Formulationsmentioning

confidence: 99%

“…Inspired by the promising results of the B&D method of Dang et al. (, ), we wish to investigate an adaptation of the method to introduce context‐dependence in the language modeling framework. In the setting of relevance feedback, we apply the B&D procedure to the RM3 relevance model.…”

Section: Model Formulationsmentioning

confidence: 99%

“…Hence, the significance of our work is the demonstration of the usefulness of context consideration in the language model framework. It provides a new way to go beyond the bag‐of‐words representation in this framework (Dang, Luk, & Allan, ). Furthermore, together with the previous studies within the BM25 framework (Dang et al., ), our results show that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A context‐dependent relevance model

Dang

Luk

Allan

2015

Asso for Info Science & Tech

Self Cite

View full text Add to dashboard Cite

Numerous past studies have demonstrated the effectiveness of the relevance model (RM) for information retrieval (IR). This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the traditional RM, the feedback information is used to improve the estimate of the query language model. In this article, we introduce an extension of RM in the setting of relevance feedback. Our method provides an additional way to incorporate feedback via the improvement of the document language models. Specifically, we make use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models. The context information is based on the words (unigrams or bigrams) appearing within a text window centered on query terms. Experiments on several Text REtrieval Conference (TREC) collections show that our contextdependent relevance model can improve retrieval performance over the baseline RM. Together with previous studies within the BM25 framework, our current study demonstrates that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.

show abstract

Section: Model Formulationsmentioning

confidence: 99%

Section: Model Formulationsmentioning

confidence: 99%

Section: Model Formulationsmentioning

confidence: 99%

Section: Model Formulationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations