Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation

Choi, Jaekeol; Jung, Euna; Suh, Jangwon; Rhee, Wonjong

doi:10.1145/3404835.3463076

Cited by 16 publications

(10 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While BERT and CONV-KNRM also perform decently well, ColBERT doesn't perform as good as the other models. Our numbers for ColBERT are consistent with another recent study [5] that also trained ColBERT on Robust04 and ClueWeb09-Cat-B.…”

Section: Relevance and Inference Efficiencysupporting

confidence: 89%

See 1 more Smart Citation

Lightweight Composite Re-Ranking for Efficient Keyword Search with BERT

Yang

Qiao

Shao

et al. 2022

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Recently transformer-based ranking models have been shown to deliver high relevance for document search and the relevanceefficiency tradeoff becomes important for fast query response times. This paper presents BECR (BERT-based Composite Re-Ranking), a lightweight composite re-ranking scheme that combines deep contextual token interactions and traditional lexical term-matching features. BECR conducts query decomposition and composes a query representation using pre-computable token embeddings based on uni-grams and skip-n-grams, to seek a tradeoff of inference efficiency and relevance. Thus it does not perform expensive transformer computations during online inference, and does not require the use of GPU. This paper describes an evaluation of relevance and efficiency of BECR with several TREC datasets. CCS Concepts• Information systems → Learning to rank.

show abstract

Section: Relevance and Inference Efficiencysupporting

confidence: 89%

“…Following previous work [34], for Robust04 and ClueWeb09-Cat-B, we re-rank the 150 documents per query retrieved by Indri initial ranking in the Lemur system 5 . For the MS MARCO Dev set, we follow a common practice to re-rank top 1000 passages per query retrieved by BM25.…”

Section: Relevance and Inference Efficiencymentioning

confidence: 99%

Lightweight Composite Re-Ranking for Efficient Keyword Search with BERT

Yang

Qiao

Shao

et al. 2022

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

show abstract

“…Another popular choice is ColBERT (Khattab and Zaharia, 2020), whose structure is more similar to dual-encoders, and thus allows KD on in-batch negative examples . Besides, a handful of studies also try to improve the performance with multi-teacher distillation (Choi et al, 2021;Hofstätter et al, 2021). However, none of them investigate how to more effectively distill the knowledge of teachers into a student with different architecture.…”

Section: Knowledge Distillation For Retrieversmentioning

confidence: 99%

“…Cross-encoder. Researchers have developed passage re-ranking models (i.e., re-rankers) further to improve end-to-end QA after the retrieval of candidate passages (Choi et al, 2021;Ren et al, 2021b;. Using a cross-encoder as a re-ranker usually achieves superior performance.…”

Section: Preliminariesmentioning

confidence: 99%

“…It allows distillation on in-batch negative examples , which is critical for training dual-encoders (Karpukhin et al, 2020). Besides, there are also a handful of studies that adopt multi-teacher distillation, i.e., using cross-encoders and ColBERT simultaneously as the teachers (Choi et al, 2021;Hofstätter et al, 2021). Notably, all these studies have verified that it is fruitful to improve dual-encoders with such cross-architecture distillation setting, where the teacher is equipped with more expressive query-passage interactions than the dual-encoder student.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Lu¹,

Liu²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new stateof-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a crossencoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.

show abstract