2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9413016
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation Beyond Model Compression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…Long-term memory builds structural representations for generalization and mimics the slow acquisition of structured knowledge in the neocortex, which can generalize well across tasks. The long-term memory then interacts with the instance-level episodic memory to retrieve structural relational knowledge (Sarfraz et al 2021) for the previous tasks encoded in the output logits. Consolidated logits are then utilized to enforce consistency in the functional space of the working model.…”
Section: Multiple Memory Systemsmentioning
confidence: 99%
“…Long-term memory builds structural representations for generalization and mimics the slow acquisition of structured knowledge in the neocortex, which can generalize well across tasks. The long-term memory then interacts with the instance-level episodic memory to retrieve structural relational knowledge (Sarfraz et al 2021) for the previous tasks encoded in the output logits. Consolidated logits are then utilized to enforce consistency in the functional space of the working model.…”
Section: Multiple Memory Systemsmentioning
confidence: 99%
“…Roughly speaking, typical model compression methods can be classified into four categories. Weight pruning, quantization, knowledge distillation (Sarfraz et al, 2021;Walawalkar et al, 2020), and low-rank decomposition (Idelbayev & Carreira-Perpinán, 2020;Lin et al, 2018;Lee et al, 2019). Even though such methods strive to find a smaller model while retaining the model's accuracy, they often tend to neglect the potential inherent in the entropy limit.…”
Section: Related Workmentioning
confidence: 99%
“…In a typical "teacher-student" knowledge distillation, smaller stu- In the original formulation, Hinton et al [20] proposed a representation distillation by way of mimicking softened softmax output of the teacher. Better generalization can be achieved by emulating the latent feature space in addition to mimicking the output of the teacher [37,39,45,36,33].…”
Section: Related Workmentioning
confidence: 99%
“…Online knowledge distillation offers a more attractive alternative owing to its one stage training and bidirectional knowledge distillation [47,15,26,37]. These approaches treat all (typically two) participating models equally, enabling them to learn from each other.…”
Section: Introductionmentioning
confidence: 99%