2021
DOI: 10.48550/arxiv.2111.03664
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Oracle Teacher: Towards Better Knowledge Distillation

Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for KD, namely Oracle Teacher, that utilizes the embeddings of both the source inputs and the output label… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 30 publications
(37 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?