2020
DOI: 10.48550/arxiv.2010.10999
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Is Retriever Merely an Approximator of Reader?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 14 publications
0
21
0
Order By: Relevance
“…The major improvements in this work are attributed to end-to-end training, which amounts to a type of distillation from the powerful T5 model into the retrieval model. It's interesting to compare this to more direct distillation methods (Izacard and Grave, 2020;Yang and Seo, 2020), which also re-ported similar gains. Our method also relies on a reader model indirectly, through the global filtering stage of generated questions in PAQ.…”
Section: Pretraining For Retrievalmentioning
confidence: 90%
“…The major improvements in this work are attributed to end-to-end training, which amounts to a type of distillation from the powerful T5 model into the retrieval model. It's interesting to compare this to more direct distillation methods (Izacard and Grave, 2020;Yang and Seo, 2020), which also re-ported similar gains. Our method also relies on a reader model indirectly, through the global filtering stage of generated questions in PAQ.…”
Section: Pretraining For Retrievalmentioning
confidence: 90%
“…The index consists of 1.2M 128-dimensional INT8 vectors (scalar-quantized and unsigned), which are the dense embeddings of the subset of the 21M passages in Wikipedia, filtered by a RoBERTa [30]-based binary classifier trained with logistic regression to exclude uninformative passages. Positives to train this classifier are the top 200 passages for each question on Natural Questions dataset and EfficientQA development set, retrieved by [40], a DPR retriever further finetuned on hard negatives using knowledge distillation from a DPR reader. Negatives are uniformly drawn from the set of 21M passages, excluding positives.…”
Section: Top Systemsmentioning
confidence: 99%
“…To make the retrieval tractable, the term-level function is approximated to first retrieve an initial set of candidates, which are then re-ranked with the true score. In the context of question answering, knowledge distillation has been used to train retrievers, either using the attention scores of the reader of the downstream task as synthetic labels (Izacard & Grave, 2021), or the relevance score from a cross encoder (Yang & Seo, 2020). Luan et al (2020) compares, theoretically and empirically, the performance of sparse and dense retrievers, including bi-, cross-and poly-encoders.…”
Section: Related Workmentioning
confidence: 99%