2022
DOI: 10.48550/arxiv.2202.12488
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learn From the Past: Experience Ensemble Knowledge Distillation

Abstract: Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic educational scenarios, learning experience is often more important than learning results. In this work, we propose a novel knowledge distillation method by integrating the teacher's experience for knowledge transfer, named experience ensemble knowledge distillation (EEKD). … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 36 publications
(79 reference statements)
0
3
0
Order By: Relevance
“…These past elements are recorded and will be replaced based on the firstin, first-out principle when the memory bank or queue reaches its capacity. The iterative teacher model maintains historical DEs [36], [53]- [58]. Feng et al [36] perform teacher learning by incorporating multiple historical model checkpoints in a parallel manner.…”
Section: Aspect Of Storage Formmentioning
confidence: 99%
See 2 more Smart Citations
“…These past elements are recorded and will be replaced based on the firstin, first-out principle when the memory bank or queue reaches its capacity. The iterative teacher model maintains historical DEs [36], [53]- [58]. Feng et al [36] perform teacher learning by incorporating multiple historical model checkpoints in a parallel manner.…”
Section: Aspect Of Storage Formmentioning
confidence: 99%
“…(2) Simple Moving Average (SMA) is a type of arithmetic average that is calculated by adding recent data and dividing the sum by the number of time periods. The works in [14], [25]- [33], [57], [58], [135] all use SMA to obtain the average multiple checkpoints, predictions or gradients of features during the learning process. SMA has been proven to result in wider optima and better generalization.…”
Section: Aspect Of Storage Formmentioning
confidence: 99%
See 1 more Smart Citation