Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020
DOI: 10.1145/3397271.3401296
|View full text |Cite
|
Sign up to set email alerts
|

Distilling Knowledge for Fast Retrieval-based Chat-bots

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 6 publications
0
11
0
Order By: Relevance
“…by comparing Margin-MSE with different knowledge distillation losses using the same training data. We compare our approach with a pointwise MSE loss, defined as follows: This is a standard approach already used by Vakili Tahami et al [39] and Li et al [24]. Additionally, we utilize a weighted RankNet loss, where we weight the samples in a batch according to the teacher margin:…”
Section: Optimization Studymentioning
confidence: 99%
“…by comparing Margin-MSE with different knowledge distillation losses using the same training data. We compare our approach with a pointwise MSE loss, defined as follows: This is a standard approach already used by Vakili Tahami et al [39] and Li et al [24]. Additionally, we utilize a weighted RankNet loss, where we weight the samples in a batch according to the teacher margin:…”
Section: Optimization Studymentioning
confidence: 99%
“…Apart from specifically training dense retrieval models, knowledge distillation gains popularity, with general-purpose BERT-style models [17,33] as well as a range of applications in IR: from sequential recommendation models [35], BERT-based retrieval chatbots [36], BERT-based Question Answering [15], reducing the size of the BERT CAT passage re-ranking model [4,11], to dense keyword matching in sponsored search [24].…”
Section: Other Dense Retrieval Training Methodsmentioning
confidence: 99%
“…Meanwhile, Knowledge distillation (Hinton et al, 2015) transfers the knowledge of the teacher model into the student model by matching the student logits with softened teacher logits. Knowledge distillation especially designed for specific tasks or model architectures exists, such as sequence generation task (Kim and Rush, 2016;Lin et al, 2020a), retrieval models Vakili Tahami et al, 2020) and for transformer architectures .…”
Section: Knowledge Transfer From Large Modelsmentioning
confidence: 99%