Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-78646-7_36
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic Document Length Priors for Language Models

Abstract: Abstract. This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
11
1

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 15 publications
2
11
1
Order By: Relevance
“…In contrast to this previous work, QSDM achieves significant improvements in all retrieval metrics, when compared to the SDM baseline 4 . Figure 5 compares the effectiveness of these two methods on the two query sets based on the GOV2 corpus (TREC-06 and TREC-07 ).…”
Section: Performance On the Gov2 Collectioncontrasting
confidence: 79%
See 2 more Smart Citations
“…In contrast to this previous work, QSDM achieves significant improvements in all retrieval metrics, when compared to the SDM baseline 4 . Figure 5 compares the effectiveness of these two methods on the two query sets based on the GOV2 corpus (TREC-06 and TREC-07 ).…”
Section: Performance On the Gov2 Collectioncontrasting
confidence: 79%
“…The presence of stopwords in the text (modeled by the features stopCover and fracStops) is positively correlated with how informative the text is [14,26], and documents with very few stopwords are unlikely to be relevant. The importance of document length (numVisTerm) for determining the document relevance is in line with previous research on document length priors [4,16,31]. Similarly, incorporating document cohesiveness (modeled in this work by the entropy feature) into the retrieval models was found to be beneficial in the past [2,17].…”
Section: Feature Importancesupporting
confidence: 79%
See 1 more Smart Citation
“…The element length (number of tokens in the element textual content) seems to be the most used as source of evidence ( Banerjee & Han, 2009;Blanco & Barreiro, 2008;Ganguly et al, 2010;Kamps et al, 2004;Lalmas, 2009;Ogilvie & Callan, 20 04, 20 05;Pehcevski, Thom, & Tahaghoghi, 20 05;Sigurbjörnsson, 20 06;Sigurbjörnsson, Kamps, & Rijke, 2004 ). Element length prior influences the relative ranking by favoring longest elements.…”
Section: Related Workmentioning
confidence: 99%
“…In our case, we classify documents based on their domain into three classes, just like the field weights. We add this document-weights w D to compute a final retrieval score as [2]:…”
Section: Idf Imentioning
confidence: 99%