2022
DOI: 10.48550/arxiv.2204.11447
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

Abstract: A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are rather different from the training data. While dense retrieval (DR) models have been demonstrated to achieve better retrieval performance than the traditional term-based retrieval models, we still know little about whether they can extrapolate. To shed light on the research question, we investigate how DR models perform in both the interpolation and extrapolation regimes. We first investigate the d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 38 publications
1
1
0
Order By: Relevance
“…Previous studies have observed that fine-tuning PLM rankers on subsets of the available training data decreases search effectiveness and, similarly, that increasing the size of the training data tends to improve search effectiveness. These types of observations and preliminary findings are reported for MS MARCO [13,24,28,33,69] and in the case of domain adaptation [26,31,43,59]. However, these conditions have never been systematically evaluated, which we do in our study.…”
Section: Related Worksupporting
confidence: 60%
“…Previous studies have observed that fine-tuning PLM rankers on subsets of the available training data decreases search effectiveness and, similarly, that increasing the size of the training data tends to improve search effectiveness. These types of observations and preliminary findings are reported for MS MARCO [13,24,28,33,69] and in the case of domain adaptation [26,31,43,59]. However, these conditions have never been systematically evaluated, which we do in our study.…”
Section: Related Worksupporting
confidence: 60%
“…These campaigns follow the Cranfield paradigm [9] to create relevance judgements on the pooled output of the participating systems. Recently there has been a growing interest in evaluating the retrieval performance of retrieval models for domain-specific retrieval tasks [2,13,14,27,36] including the medical domain [22,23,35]. Domain-specific retrieval tasks often lack a reliable test collection with human relevance judgments following the Cranfield paradigm [22,27].…”
Section: Introductionmentioning
confidence: 99%