2021
DOI: 10.48550/arxiv.2110.07367
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(37 citation statements)
references
References 25 publications
0
37
0
Order By: Relevance
“…Instead of compressing multi-vector representations as we do, much recent work has focused on improving the quality of single-vector models, which are often very sensitive to the specifics of supervision. This line of work can be decomposed into three directions: (1) distillation of more expressive architectures (Hofstätter et al, 2020; including explicit denoising (Qu et al, 2021;Ren et al, 2021b), (2) hard negative sampling (Xiong et al, 2020;Zhan et al, 2020a), and…”
Section: Improving the Quality Of Single-vector Representationsmentioning
confidence: 99%
See 2 more Smart Citations
“…Instead of compressing multi-vector representations as we do, much recent work has focused on improving the quality of single-vector models, which are often very sensitive to the specifics of supervision. This line of work can be decomposed into three directions: (1) distillation of more expressive architectures (Hofstätter et al, 2020; including explicit denoising (Qu et al, 2021;Ren et al, 2021b), (2) hard negative sampling (Xiong et al, 2020;Zhan et al, 2020a), and…”
Section: Improving the Quality Of Single-vector Representationsmentioning
confidence: 99%
“…3 We then collect w-way tuples consisting of a query, a highly-ranked passage (or labeled positive), and one or more lower-ranked passages. Like RocketQAv2 (Ren et al, 2021b), we use a KL-Divergence loss to distill the cross-encoder's scores into the ColBERT architecture. We also employ in-batch negatives per GPU, where a cross-entropy loss is applied between the query and its positive against all passages corresponding to other queries in the same batch.…”
Section: Supervisionmentioning
confidence: 99%
See 1 more Smart Citation
“…In recent years, it becomes increasingly popular to apply KD for representation learning. For example, in [16,36,52], the representation models are distilled from the relevance scores predicted by the re-ranking models; and in [7,46], the representation models are pretrained by distilling knowledge from the pseudo labels annotated by the teacher models. It is generally believed that KD contributes to the representation learning thanks to the exploitation of massive unlabeled data [46] and label smoothing [48].…”
Section: Related Workmentioning
confidence: 99%
“…For example, the retriever can be improved by distilling from the ranker with a more capable architecture (Ding et al, 2020;Hofstätter et al, 2020), and the ranker can be improved with training instances generated from the retriever Huang et al, 2020). Based on these observations, Ren et al (2021) proposed the dynamic listwise distillation to jointly optimize the two modules in order to achieve mutual improvement and contribute to the final ranking performance.…”
Section: End-to-end Ir Based On Ptmsmentioning
confidence: 99%