2023
DOI: 10.48550/arxiv.2301.02998
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Abstract: We carried out a reproducibility study of InPars recipe for unsupervised training of neural rankers [4]. As a by-product of this study, we developed a simple-yet-effective modification of InPars, which we called InPars-light. Unlike InPars, InPars-light uses only a freely available language model BLOOM and 7x-100x smaller ranking models. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7-30%) and statistically significant improvements over BM25 in nDCG or M… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 33 publications
(57 reference statements)
0
1
0
Order By: Relevance
“…Data augmentation for information retrieval (IR) has gained attention as a promising research area. Previous studies use large language models (LLMs) (Zhao et al, 2023) to generate synthetic training data for retrievers (Jeronymo et al, 2023;Dai et al, 2023;Boytsov et al, 2023;Bonifacio et al, 2022), significantly improving effectiveness of unsupervised retrievers. Specifically, these studies all build pseudo query-document pairs by gen-erating synthetic queries given documents in an existing corpus.…”
Section: Introductionmentioning
confidence: 99%
“…Data augmentation for information retrieval (IR) has gained attention as a promising research area. Previous studies use large language models (LLMs) (Zhao et al, 2023) to generate synthetic training data for retrievers (Jeronymo et al, 2023;Dai et al, 2023;Boytsov et al, 2023;Bonifacio et al, 2022), significantly improving effectiveness of unsupervised retrievers. Specifically, these studies all build pseudo query-document pairs by gen-erating synthetic queries given documents in an existing corpus.…”
Section: Introductionmentioning
confidence: 99%