2019
DOI: 10.48550/arxiv.1904.01769
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

Abstract: Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the performance on old classes. Conventional methods, however, sequentially distill knowledge only from the last model, leading to performance degradation on the old classes in later incremental learning steps. In this paper, we propose a multi-model and multi-level knowledge distillation strategy. Instead of sequentially distilling knowledge onl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 32 publications
0
11
0
Order By: Relevance
“…Knowledge distillation provides an effective way to preserve and transfer learned knowledge without catastrophic forgetting. Recently, an increasing number of KD variants, which are based on the lifelong learning, have been developed (Jang et al, 2019;Flennerhag et al, 2019;Peng et al, 2019b;Liu et al, 2019d;Lee et al, 2019b;Zhai et al, 2019;Zhou et al, 2019c;Shmelkov et al, 2017;Li and Hoiem, 2017). The methods proposed in (Jang et al, 2019;Peng et al, 2019b;Liu et al, 2019d;Flennerhag et al, 2019) adopt meta-learning.…”
Section: Lifelong Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…Knowledge distillation provides an effective way to preserve and transfer learned knowledge without catastrophic forgetting. Recently, an increasing number of KD variants, which are based on the lifelong learning, have been developed (Jang et al, 2019;Flennerhag et al, 2019;Peng et al, 2019b;Liu et al, 2019d;Lee et al, 2019b;Zhai et al, 2019;Zhou et al, 2019c;Shmelkov et al, 2017;Li and Hoiem, 2017). The methods proposed in (Jang et al, 2019;Peng et al, 2019b;Liu et al, 2019d;Flennerhag et al, 2019) adopt meta-learning.…”
Section: Lifelong Distillationmentioning
confidence: 99%
“…The teacher knowledge obtained from the image modalities and semantic information are preserved and transferred. Moreover, to address the problem of catastrophic forgetting in lifelong learning, global distillation (Lee et al, 2019b), multi-model distillation (Zhou et al, 2019c), knowledge distillation-based lifelong GAN (Zhai et al, 2019) and the other KD-based methods (Li and Hoiem, 2017;Shmelkov et al, 2017) have been developed to extract the learned knowledge and teach the student network on new tasks.…”
Section: Lifelong Distillationmentioning
confidence: 99%
“…The LwF method does not require any old data to be stored and uses KD as an additional regularization term on the loss function to force the new model to follow the behavior of the old model on old tasks. (Zhou et al, 2019) proposed a multi-model distillation method called M2KD which directly matches the category outputs of the current model with those of the corresponding old models. Mask based pruning is used to compress the old models in M2KD.…”
Section: Kd Based Incremental Learning Methodsmentioning
confidence: 99%
“…Li et al first applied KD for incremental learning and built an incremental classifier called Learning without Forgetting (LwF) [19]. Research in this type of method aims to acquire the old task information through various distillation approaches [39], [40]. Architectural Strategies solve the catastrophic forgetting problem by designing a unique sub-network for each incremental task.…”
Section: A Incremental Classificationmentioning
confidence: 99%