Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings

Alqahtani, Sawsan; Lalwani, Garima; Zhang, Yi; Romeo, Salvatore; Mansour, Saab

doi:10.48550/arxiv.2110.02887

Cited by 1 publication

(2 citation statements)

References 22 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For tasks involving cross-lingual settings, Nguyen and Luu [29] employed OT distance as a part of the loss function in a knowledge distillation framework for improving the cross-lingual summarization. Alqahtani et al [1] incorporated OT as an alignment objective to improve the multilingual word representations. In this work, we explore transferring the retrieval knowledge in a cross-lingual setting via OT.…”

Section: Optimal Transportmentioning

confidence: 99%

“…Therefore, we approximate the calculation of 𝐷 ( q, 𝑞) as an optimal transport problem. First, we assign equal mass to the tokens in q and 𝑞 by defining a uniform source probability distribution, 𝜇 𝑠 , on q and a uniform target probability distribution, 𝜇 𝑡 , on 𝑞: 𝜇 𝑠 (𝑖) = 1 𝐿 and 𝜇 𝑡 ( 𝑗) = 1 𝐿 where 1 ≤ 𝑖, 𝑗 ≤ 𝐿. The set of transportation plans between these two distributions is then the set of doubly stochastic matrices P defined as…”

Section: Optimal Transport Knowledge Distillationmentioning

confidence: 99%

See 1 more Smart Citation

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Huang,

Yu,

Allan

2023

Preprint

View full text Add to dashboard Cite

Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval models. However, due to unbalanced pre-training data in different languages, multilingual language models have already shown a performance gap between high and low-resource languages in many downstream tasks. And cross-lingual retrieval models built on such pre-trained models can inherit language bias, leading to suboptimal result for low-resource languages. Moreover, unlike the English-to-English retrieval task, where large-scale training collections for document ranking such as MS MARCO are available, the lack of cross-lingual retrieval data for low-resource language makes it more challenging for training cross-lingual retrieval models. In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual information retrieval. To transfer a model from high to low resource languages, OPTICAL forms the cross-lingual token alignment task as an optimal transport problem to learn from a well-trained monolingual retrieval model. By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training, which is more feasible for low-resource languages. Experimental results show that, with minimal training data, OPTI-CAL significantly outperforms strong baselines on low-resource languages, including neural machine translation. CCS CONCEPTS• Information systems → Information retrieval; Multilingual and cross-lingual retrieval; Retrieval models and ranking.

show abstract

Section: Optimal Transportmentioning

confidence: 99%

Section: Optimal Transport Knowledge Distillationmentioning

confidence: 99%