RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Ren, Ruiyang; Qu, Yingqi; Liu, Jing; Zhao, Wayne Xin; She, Qiaoqiao; Wu, Hua; Wang, Haifeng; Wen, Ji-Rong

doi:10.48550/arxiv.2110.07367

Cited by 15 publications

(37 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead of compressing multi-vector representations as we do, much recent work has focused on improving the quality of single-vector models, which are often very sensitive to the specifics of supervision. This line of work can be decomposed into three directions: (1) distillation of more expressive architectures (Hofstätter et al, 2020; including explicit denoising (Qu et al, 2021;Ren et al, 2021b), (2) hard negative sampling (Xiong et al, 2020;Zhan et al, 2020a), and…”

Section: Improving the Quality Of Single-vector Representationsmentioning

confidence: 99%

“…3 We then collect w-way tuples consisting of a query, a highly-ranked passage (or labeled positive), and one or more lower-ranked passages. Like RocketQAv2 (Ren et al, 2021b), we use a KL-Divergence loss to distill the cross-encoder's scores into the ColBERT architecture. We also employ in-batch negatives per GPU, where a cross-entropy loss is applied between the query and its positive against all passages corresponding to other queries in the same batch.…”

Section: Supervisionmentioning

confidence: 99%

“…Considering this challenge, it might seem more fruitful to focus instead on addressing the fragility of single-vector models by introducing new supervision paradigms for negative mining (Xiong et al, 2020), pretraining , and distillation (Qu et al, 2021). Indeed, recent singlevector models with highly-tuned supervision strategies (Ren et al, 2021b;Formal et al, 2021a) sometimes perform on-par or even better than "vanilla" late interaction models, and it is not necessarily clear whether late interaction architectures-with their fixed token-level inductive biases-admit similarly large gains from improved supervision.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Santhanam¹,

Khattab²,

Saad-Falcon³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural information retrieval (IR) has greatly advanced search and other knowledgeintensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the space footprint of late interaction models by 5-8×.

show abstract

Section: Improving the Quality Of Single-vector Representationsmentioning

confidence: 99%

Section: Supervisionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Santhanam¹,

Khattab²,

Saad-Falcon³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In recent years, it becomes increasingly popular to apply KD for representation learning. For example, in [16,36,52], the representation models are distilled from the relevance scores predicted by the re-ranking models; and in [7,46], the representation models are pretrained by distilling knowledge from the pseudo labels annotated by the teacher models. It is generally believed that KD contributes to the representation learning thanks to the exploitation of massive unlabeled data [46] and label smoothing [48].…”

Section: Related Workmentioning

confidence: 99%

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Xiao¹,

Liu²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data.In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers", which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students", which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice. The evaluations are performed on MS MARCO and Natural Questions benchmarks, where Distill-VQ notably outperforms the SOTA VQ methods in Recall and MRR. Our code is avaliable at https://github.com/staoxiao/LibVQ.

show abstract

“…For example, the retriever can be improved by distilling from the ranker with a more capable architecture (Ding et al, 2020;Hofstätter et al, 2020), and the ranker can be improved with training instances generated from the retriever Huang et al, 2020). Based on these observations, Ren et al (2021) proposed the dynamic listwise distillation to jointly optimize the two modules in order to achieve mutual improvement and contribute to the final ranking performance.…”

Section: End-to-end Ir Based On Ptmsmentioning

confidence: 99%

Pre-training Methods in Information Retrieval

Fan¹,

Xie²,

Cai³

et al. 2021

Preprint

View full text Add to dashboard Cite

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the

show abstract

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Cited by 15 publications

References 25 publications

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

Pre-training Methods in Information Retrieval

Contact Info

Product

Resources

About