“…In last few years, a variety of knowledge distillation methods have been widely used for model compression in different visual recognition applications. Specifically, most of the knowledge distillation methods were previously developed for image classification (Li and Hoiem, 2017;Peng et al, 2019b;Bagherinezhad et al, 2018;Chen et al, 2018a;Wang et al, 2019b;Mukherjee et al, 2019;Zhu et al, 2019) and then extended to other visual recognition applications, including face recognition (Luo et al, 2016;Kong et al, 2019;Yan et al, 2019;Ge et al, 2018;Wang et al, 2018bWang et al, , 2019cDuong et al, 2019;Wu et al, 2020;Wang et al, 2017), action recognition (Hao and Zhang, 2019;Thoker and Gall, 2019;Luo et al, 2018;Garcia et al, 2018;Wu et al, 2019b;Zhang et al, 2020), object detection Hong and Yu, 2019;Shmelkov et al, 2017;Wei et al, 2018;Wang et al, 2019d), lane detection (Hou et al, 2019), image or video segmentation (He et al, 2019;Liu et al, 2019g;Mullapudi et al, 2019;Siam et al, 2019;Dou et al, 2020), video classification (Bhardwaj et al, 2019;Zhang and Peng, 2018), pedestrian detection (Shen et al, 2016), facial landmark detection (Dong and Yang, 2019), person re-identification (Wu et al, 2019a)…”