Learning concept importance using a weighted dependence model

Bendersky, Michael; Metzler, Donald; Croft, W. Bruce

doi:10.1145/1718487.1718492

Cited by 148 publications

(180 citation statements)

References 25 publications

Supporting

Mentioning

175

Contrasting

Order By: Relevance

“…Even here there are limitations, since our lexical items are not easily aligned with those found in other collections. For this reason, we can not leverage external corpus statistics from, for example, Google or Wikipedia (Bendersky et al, 2011;Bendersky et al, 2010;Bendersky and Croft, 2008;Lease, 2009), or phrases from search logs (Svore et al, 2010).…”

Section: Motivation and Related Workmentioning

confidence: 99%

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

White

Oard

Jansen

et al. 2015

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

show abstract

Section: Motivation and Related Workmentioning

confidence: 99%

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

White

Oard

Jansen

et al. 2015

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…However, recent research demonstrates that more complex retrieval models that incorporate phrases, term proximities and expansion terms can significantly outperform the standard bag-of-word models, especially in the context of large-scale web collections [6] [5] [7] [8] and longer, more complex queries [9] [10].…”

Section: Ranker M * Amentioning

confidence: 99%

“…Both of these methods incorporate textual features beyond query terms and were shown to be highly effective in prior work [6] [5].…”

Section: Ranker M * Amentioning

confidence: 99%

Two-Stage Learning to Rank for Information Retrieval

Dang

Bendersky

Croft

2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Current learning to rank approaches commonly focus on learning the best possible ranking function given a small fixed set of documents. This document set is often retrieved from the collection using a simple unsupervised bag-of-words method, e.g. BM25. This can potentially lead to learning a sub-optimal ranking, since many relevant documents may be excluded from the initially retrieved set. In this paper we propose a novel two-stage learning framework to address this problem. We first learn a ranking function over the entire retrieval collection using a limited set of textual features including weighted phrases, proximities and expansion terms. This function is then used to retrieve the best possible subset of documents over which the final model is trained using a larger set of query-and document-dependent features. Empirical evaluation using two web collections unequivocally demonstrates that our proposed two-stage framework, being able to learn its model from more relevant documents, outperforms current learning to rank approaches.

show abstract

“…To deal with the last problem, Bendersky et al [1] extended recently the MRF-SD model to a weighted MRF-SD model (which we denote by WSD), in which the weight of a term and a pair of terms becomes dependent on the individual term and pair of terms. The scoring function is as follows:…”

Section: Related Workmentioning

confidence: 99%

“…to assign variable weights to unigrams and pairs of terms. However, the relationship between non-adjacent query terms is still ignored in [1] and the ordered and un-ordered pairs of terms are treated in the same way. Our model will go a step further: we will consider dependencies between non-adjacent characters as between…”

Section: Related Workmentioning

confidence: 99%

Modeling Variable Dependencies between Characters in Chinese Information Retrieval

Shi

Nie

2010

Information Retrieval Technology

View full text Add to dashboard Cite

Abstract. Chinese IR can work on words and/or character n-grams. In previous studies, when several types of index are used, independence is usually assumed between them, which obviously is not true in reality. In this paper, we propose a model for Chinese IR that integrates different types of dependency between Chinese characters. The role of a pair of dependent characters in the matching process is variable, depending on the pair's ability to describe the underlying meaning and to retrieve relevant documents. The weight of the pair is learnt using SVM. Our experiments on TREC and NTCIR Chinese collections show that our model can significantly outperform most existing approaches. The results confirm the necessity to integrate dependent pairs of characters in Chinese IR and to use them according to their possible contribution to IR.

show abstract

Learning concept importance using a weighted dependence model

Cited by 148 publications

References 25 publications

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

Two-Stage Learning to Rank for Information Retrieval

Modeling Variable Dependencies between Characters in Chinese Information Retrieval

Contact Info

Product

Resources

About