2022
DOI: 10.48550/arxiv.2212.01340
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…The authors show this by conducting a post-hoc comparison of published works as well as an in-depth cost analysis of representative methods (BM25, Dense Passage Retrieval, SPLADE, and ColBERTv2) to arrive at conclusions that are broadly consistent with the observations around model inference of Scells et al (2022). Santhanam et al (2022b) use this fact to encourage the adoption of multidimensional leaderboards and motivate research on metrics that capture the overall utility of a retrieval or ranking method in a single quantity. They point to the Dynascores proposed by Ma et al (2021b) as one such measure that allows for a single ranking of a collection of systems.…”
Section: Designing Multidimensional Leaderboardsmentioning
confidence: 89%
See 2 more Smart Citations
“…The authors show this by conducting a post-hoc comparison of published works as well as an in-depth cost analysis of representative methods (BM25, Dense Passage Retrieval, SPLADE, and ColBERTv2) to arrive at conclusions that are broadly consistent with the observations around model inference of Scells et al (2022). Santhanam et al (2022b) use this fact to encourage the adoption of multidimensional leaderboards and motivate research on metrics that capture the overall utility of a retrieval or ranking method in a single quantity. They point to the Dynascores proposed by Ma et al (2021b) as one such measure that allows for a single ranking of a collection of systems.…”
Section: Designing Multidimensional Leaderboardsmentioning
confidence: 89%
“…We too encourage the development of multidimensional leaderboards to incentivize research into efficient and effective systems. In fact, while Santhanam et al (2022b) argue for leaderboards that capture inference efficiency, we believe training efficiency too must be reflected in the overall utility of a retrieval and ranking system. In spite of arguments that training a model incurs a cost that is amortized and thus comparably insignificant, we note that retrieval and ranking models have a relatively short lifetime: As the data distribution shifts, models must often be re-trained or fine-tuned on fresh samples.…”
Section: Designing Multidimensional Leaderboardsmentioning
confidence: 96%
See 1 more Smart Citation