Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.329
|View full text |Cite
|
Sign up to set email alerts
|

Learning Cross-Lingual IR from an English Retriever

Abstract: We present DR.DECR (Dense Retrieval with Distillation-Enhanced Cross-Lingual Representation), a new cross-lingual information retrieval (CLIR) system trained using multi-stage knowledge distillation (KD). The teacher of DR.DECR relies on a highly effective but computationally expensive two-stage inference process consisting of query translation and monolingual IR, while the student, DR.DECR, executes a single CLIR step. We teach DR.DECR powerful multilingual representations as well as CLIR by optimizing two co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…These CLIR collections contain the correct translation knowledge, but their retrieval knowledge is synthetically generated. On the other hand, some CLIR collections are created by translating a query from a commercial search engine into the target languages using NMT models [4,19]. The relevance judgments are more credible in these collections since they are extracted from the query log.…”
Section: Related Work 21 Neural Matching Models For Clirmentioning
confidence: 99%
See 2 more Smart Citations
“…These CLIR collections contain the correct translation knowledge, but their retrieval knowledge is synthetically generated. On the other hand, some CLIR collections are created by translating a query from a commercial search engine into the target languages using NMT models [4,19]. The relevance judgments are more credible in these collections since they are extracted from the query log.…”
Section: Related Work 21 Neural Matching Models For Clirmentioning
confidence: 99%
“…This way, the teacher model's knowledge can be transferred into the student model. The idea of knowledge distillation is wildly used in the field of computer vision [20,42,46], natural language processing [31,34] and information retrieval [15,19,25]. Our method is also an extension of knowledge distillation.…”
Section: Knowledge Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…Knowledge distillation (Hinton et al, 2014) is a well known model compression method usually to train a small model (called student) leveraging outputs from a more complex model (called teacher) as part of loss functions to be minimized. Recent knowledge distillation approaches are more complex e.g., using intermediate layers' outputs (embeddings or feature maps) besides the final output (logits) of teacher models with auxiliary module branches attached to teacher and/or student models during training (Kim et al, 2018;Zhang et al, 2020;Chen et al, 2021), using multiple teachers (Mirzadeh et al, 2020;Matsubara et al, 2022b), and training multilingual or non-English models solely with an English teacher model (Reimers and Gurevych, 2020;Li et al, 2022b;Gupta et al, 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Knowledge distillation (Hinton et al, 2014) is a well known model compression method usually to train a small model (called student) leveraging outputs from a more complex model (called teacher) as part of loss functions to be minimized. Recent knowledge distillation approaches are more complex e.g., using intermediate layers' outputs (embeddings or feature maps) besides the final output (logits) of teacher models with auxiliary module branches attached to teacher and/or student models during training (Kim et al, 2018;Zhang et al, 2020;Chen et al, 2021), using multiple teachers (Mirzadeh et al, 2020;Matsubara et al, 2022b), and training multilingual or non-English models solely with an English teacher model (Reimers and Gurevych, 2020;Li et al, 2022b;Gupta et al, 2023).…”
Section: Introductionmentioning
confidence: 99%