2021
DOI: 10.48550/arxiv.2112.07708
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Retrieve Passages without Supervision

Abstract: Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. We investigate whether dense retrievers can be learned in a self-supervised fashion, and applied effectively without any annotations. We observe that existing pretrained models for retrieval struggle in this scenario, and propose a new pretraining scheme designed for retrieval: recurring span retrieval. We use recurring spans across passages in a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…[166] proposes GPL method that uses generative pseudo-labeling for unsupervised domain adaptation of dense retrieval. [167] proposes a method called Spider, which enables unsupervised passage retrieval. [168] proposes a method that utilizes contrastive pre-training to learn embeddings for text and code.…”
Section: Multi-vector Representationmentioning
confidence: 99%
“…[166] proposes GPL method that uses generative pseudo-labeling for unsupervised domain adaptation of dense retrieval. [167] proposes a method called Spider, which enables unsupervised passage retrieval. [168] proposes a method that utilizes contrastive pre-training to learn embeddings for text and code.…”
Section: Multi-vector Representationmentioning
confidence: 99%
“…Pre-training. NLP has recently borrowed ideas from contrastive learning techniques in Computer Vision, with the goal of learning high-quality sentence or document representations without annotation [7,11,21,28,29]. The general idea consists in designing pre-training tasks, that are better suited for subsequently training neural retrievers.…”
Section: Distillation Hard Negative Mining and Plm Initializationmentioning
confidence: 99%
“…Retrieval-based language models (R-LMs) have recently been shown to improve over standard neural models in a variety of tasks such as unconditional language modeling (Guu et al, 2018;He et al, 2020), machine translation (Zhang et al, 2018;Gu et al, 2018;Khandelwal et al, 2021), question answering (Karpukhin et al, 2020;Ram et al, 2021), and code generation (Hayati et al, 2018;Hashimoto et al, 2018). The key ingredient of R-LMs is their ability to utilize training examples at test time without having to rely on the information encoded in the model's weights only.…”
Section: Introductionmentioning
confidence: 99%