Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Jung, Euna; Choi, Jaekeol; Rhee, Wonjong

doi:10.1145/3485447.3511978

Cited by 11 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, our PEFA is also applicable to both pre-trained and fine-tuned ERMs, even ones initialized from black-box LLMs. Note that PEFA is orthogonal and complement to most existing literature that aims to obtain better pre-trained or fine-tuned ERMs at the learning stage, including recent studies of the parameter-efficient fine-tuning of ERMs [28,37,44]. Finally, for the ease of discussion, we assume embeddings obtained from ERMs are unit-norm (i.e., ℓ 2 normalized), hence the inner product is equivalent to the cosine similarity.…”

Section: Problem Statementmentioning

confidence: 99%

PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models

Chang,

Jiang,

Zhang

et al. 2024

Proceedings of the 17th ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA framework, namely ParamEter-Free Adapters, for fast tuning of ERMs without any backward pass in the optimization. At index building stage, PEFA equips the ERM with a non-parametric 𝑘-nearest neighbor (kNN) component. At inference stage, PEFA performs a convex combination of two scoring functions, one from the ERM and the other from the kNN. Based on the neighborhood definition, PEFA framework induces two realizations, namely PEFA-XL (i.e., extra large) using double ANN indices and PEFA-XS (i.e., extra small) using a single ANN index. Empirically, PEFA achieves significant improvement on two retrieval applications. For document retrieval, regarding Recall@100 metric, PEFA improves not only pre-trained ERMs on Trivia-QA by an average of 13.2%, but also fine-tuned ERMs on NQ-320K by an average of 5.5%, respectively. For product search, PEFA improves the Recall@100 of the fine-tuned ERMs by an average of 5.3% and 14.5%, for PEFA-XS and PEFA-XL, respectively. Our code is available at https://github.com/ amzn/pecos/tree/mainline/examples/pefa-wsdm24.

show abstract

Section: Problem Statementmentioning

confidence: 99%

PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models

Chang,

Jiang,

Zhang

et al. 2024

Proceedings of the 17th ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

show abstract

“…in a Siamese fashion. Jung et al [15] use a semi-Siamese setting, where the encoders do share parameters as well, but they are adapted to their specific role (query or document encoding) using light fine-tuning methods. We are not aware of any approaches that employ heterogeneous models, where the two encoders do not share the same model architecture and initial weights.…”

Section: Neural Retrieval and Rankingmentioning

confidence: 99%

“…(1) The encoders may be unable to fully adapt to the characteristics of their respective inputs. For example, queries are usually short and concise whereas documents are longer and more complex [15]. (2) The query encoder has the same number of parameters as the document encoder by design.…”

Section: Heterogeneous Dual-encodersmentioning

confidence: 99%

“…The relevance of a query-document pair is then computed as the dot product of the query and document representation vectors. This is referred to as two-tower, bi-encoder or dual-encoder architecture and has been used for retrieval [14,16,22] and re-ranking [15,20,35]. Typically, the query and document encoder either (1) are architecturally identical and initialized using the same pre-trained model or (2) even share their weights in a Siamese fashion.…”

Section: Introductionmentioning

confidence: 99%

“…Queries are often short and concise, suggesting that query encoders need not be overly complex. There is evidence showing that it is beneficial to adapt the query and document encoders to their respective characteristics [15]. Extreme implementations of this are by Zhuang and Zuccon [34,35], who use sparse parameter-free query encoders with little drop in performance, but these models are only used for re-ranking.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Distribution-Aligned Fine-Tuning for Efficient Neural Retrieval

Leonhardt¹,

Jahnke²,

Anand³

2022

Preprint

View full text Add to dashboard Cite

Dual-encoder-based neural retrieval models achieve appreciable performance and complement traditional lexical retrievers well due to their semantic matching capabilities, which makes them a common choice for hybrid IR systems. However, these models exhibit a performance bottleneck in the online query encoding step, as the corresponding query encoders are usually large and complex Transformer models.In this paper we investigate heterogeneous dual-encoder models, where the two encoders are separate models that do not share parameters or initializations. We empirically show that heterogeneous dual-encoders are susceptible to collapsing representations, causing them to output constant trivial representations when they are fine-tuned using a standard contrastive loss due to a distribution mismatch. We propose DAFT, a simple two-stage fine-tuning approach that aligns the two encoders in order to prevent them from collapsing. We further demonstrate how DAFT can be used to train efficient heterogeneous dual-encoder models using lightweight query encoders.

show abstract