Learning Cross-Lingual IR from an English Retriever

Li, Yulong; Franz, Martin; Sultan, Md Arafat; Iyer, Bhavani; Lee, Young-Suk; Sil, Avirup

doi:10.18653/v1/2022.naacl-main.329

“…These CLIR collections contain the correct translation knowledge, but their retrieval knowledge is synthetically generated. On the other hand, some CLIR collections are created by translating a query from a commercial search engine into the target languages using NMT models [4,19]. The relevance judgments are more credible in these collections since they are extracted from the query log.…”

Section: Related Work 21 Neural Matching Models For Clirmentioning

confidence: 99%

“…This way, the teacher model's knowledge can be transferred into the student model. The idea of knowledge distillation is wildly used in the field of computer vision [20,42,46], natural language processing [31,34] and information retrieval [15,19,25]. Our method is also an extension of knowledge distillation.…”

Section: Knowledge Distillationmentioning

confidence: 99%

See 1 more Smart Citation

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Huang,

Yu,

Allan

2023

Preprint

0

View full text Add to dashboard Cite

Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval models. However, due to unbalanced pre-training data in different languages, multilingual language models have already shown a performance gap between high and low-resource languages in many downstream tasks. And cross-lingual retrieval models built on such pre-trained models can inherit language bias, leading to suboptimal result for low-resource languages. Moreover, unlike the English-to-English retrieval task, where large-scale training collections for document ranking such as MS MARCO are available, the lack of cross-lingual retrieval data for low-resource language makes it more challenging for training cross-lingual retrieval models. In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual information retrieval. To transfer a model from high to low resource languages, OPTICAL forms the cross-lingual token alignment task as an optimal transport problem to learn from a well-trained monolingual retrieval model. By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training, which is more feasible for low-resource languages. Experimental results show that, with minimal training data, OPTI-CAL significantly outperforms strong baselines on low-resource languages, including neural machine translation. CCS CONCEPTS• Information systems → Information retrieval; Multilingual and cross-lingual retrieval; Retrieval models and ranking.

show abstract

“…Knowledge distillation (Hinton et al, 2014) is a well known model compression method usually to train a small model (called student) leveraging outputs from a more complex model (called teacher) as part of loss functions to be minimized. Recent knowledge distillation approaches are more complex e.g., using intermediate layers' outputs (embeddings or feature maps) besides the final output (logits) of teacher models with auxiliary module branches attached to teacher and/or student models during training (Kim et al, 2018;Zhang et al, 2020;Chen et al, 2021), using multiple teachers (Mirzadeh et al, 2020;Matsubara et al, 2022b), and training multilingual or non-English models solely with an English teacher model (Reimers and Gurevych, 2020;Li et al, 2022b;Gupta et al, 2023).…”

Section: Introductionmentioning

confidence: 99%

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

Matsubara

2023

Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

1

0

View full text Add to dashboard Cite

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill 1 , a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face 2 , and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.

show abstract

“…Knowledge distillation (Hinton et al, 2014) is a well known model compression method usually to train a small model (called student) leveraging outputs from a more complex model (called teacher) as part of loss functions to be minimized. Recent knowledge distillation approaches are more complex e.g., using intermediate layers' outputs (embeddings or feature maps) besides the final output (logits) of teacher models with auxiliary module branches attached to teacher and/or student models during training (Kim et al, 2018;Zhang et al, 2020;Chen et al, 2021), using multiple teachers (Mirzadeh et al, 2020;Matsubara et al, 2022b), and training multilingual or non-English models solely with an English teacher model (Reimers and Gurevych, 2020;Li et al, 2022b;Gupta et al, 2023).…”

Section: Introductionmentioning

confidence: 99%

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

Matsubara¹

2021

Reproducible Research in Pattern Recognition

View full text Add to dashboard Cite

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill 1 , a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face 2 , and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.

show abstract

Learning Cross-Lingual IR from an English Retriever

Cited by 12 publications

References 7 publications

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

Contact Info

Product

Resources

About