Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval 2011
DOI: 10.1145/2009916.2010058
|View full text |Cite
|
Sign up to set email alerts
|

Pseudo test collections for learning web search ranking functions

Abstract: Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank models. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing web search test collections in a completely unsupervised manner given only a large web corpus as input. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(28 citation statements)
references
References 38 publications
0
25
0
Order By: Relevance
“…Weakly supervised learning refers to a learning strategy in which the querydocument labels are automatically generated using an existing retrieval model, such as BM25. The use of pseudo-labels for training ranking models has been proposed by Asadi et al [109]. More recently, Dehghani et al [27] proposed to train neural ranking models using weak supervision and observed up to 35% improvement compared to BM25 which plays the role of weak labeler.…”
Section: Training Strategiesmentioning
confidence: 99%
“…Weakly supervised learning refers to a learning strategy in which the querydocument labels are automatically generated using an existing retrieval model, such as BM25. The use of pseudo-labels for training ranking models has been proposed by Asadi et al [109]. More recently, Dehghani et al [27] proposed to train neural ranking models using weak supervision and observed up to 35% improvement compared to BM25 which plays the role of weak labeler.…”
Section: Training Strategiesmentioning
confidence: 99%
“…where σ (·) is the sigmoid function and w r is the weight to be discussed later. There are several differences compared to Eqn (2). First, in Eqn (5), we treat the top-K ranked documents from the teacher model as positive instances and there is no negative instance.…”
Section: Incorporating Distillation Lossmentioning
confidence: 99%
“…Azzopardi et al [3] propose several methods for sampling query terms from web documents for known-item search, while Asadi et al [2] avoid the problem by using anchor texts. Neither approach is applicable in the microblog setting due to a lack of both redundancy in the tweets and anchor texts.…”
Section: Generating Queriesmentioning
confidence: 99%
“…Asadi et al [2] describe a method for generating pseudo test collections for training learning to rank methods for web retrieval. Their methods build on the idea that anchor text in web documents is a good source for sampling queries, and the documents that these anchors link to are regarded as relevant documents for the anchor text (query).…”
Section: Related Workmentioning
confidence: 99%