Improved Knowledge Distillation via Teacher Assistant

Mirzadeh, Seyed-Iman; Farajtabar, Mehrdad; Li, Ang; Levine, Nir; Matsukawa, Akihiro; Ghasemzadeh, Hassan

doi:10.48550/arxiv.1902.03393

Cited by 8 publications

(15 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yu et al [100] proposed two new loss functions to simulate the communication between the deep teacher network and the small student network: one is based on the absolute teachers, the other is based on the relative teacher network. Mirzadeh et al [101] introduced multi-step knowledge extraction technology and used a medium-sized network (teacher assistant) to fill the gap between students and teachers.…”

Section: A Knowledge From Logitsmentioning

confidence: 99%

“…Knowledge from Details Hinton et al [97] Logits Cross Entropy Huang et al [99] Logits Cross Entropy and maximum mean discrepancy Yu et al [100] Logits Hints and attention Mirzadeh et al [101] Logits Use teacher assistant Romero et al [102] Intermediate layers MSEloss in a certain middle layer Yim et al [103] Intermediate layers Gram matrix loss in multiple middle layers Zagoruyko et al [104] Intermediate layers Attention transfer loss in multiple middle layers Zhang et al [105] Intermediate layers Adaptive selection a middle layer Peng et al [106] Mutual information Correlation between multiple instances Crowley et al [107] Self structures With the same structure, use cheap convolution blocks Park et al [108] Structured knowledge Use a relational potential function to transfers the information Lopez-Paz et al [109] Privileged information Use pair-wise distillation and holistic distillation between two neural networks and proposed an information theory framework for knowledge transfer. Tung et al [112] proposed a new form of KD loss, which was inspired by a similar input pattern in a well-trained network.…”

Section: Referencesmentioning

confidence: 99%

See 1 more Smart Citation

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Zhang

Ying

Dai

et al. 2021

IEEE Internet Things J.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have shown great success in completing complex tasks. However, DNNs inevitably bring high computational cost and storage consumption due to the complexity of hierarchical structures, thereby hindering their wide deployment in Internet-of-Things (IoT) devices, which have limited computational capability and storage capacity. Therefore, it is a necessity to investigate the technologies to compact DNNs. Despite tremendous advances in compacting DNNs, few surveys summarize compacting-DNNs technologies, especially for IoT applications. Hence, this paper presents a comprehensive study on compacting-DNNs technologies. We categorize compacting-DNNs technologies into three major types: 1) network model compression, 2) Knowledge Distillation (KD), 3) modification of network structures. We also elaborate on the diversity of these approaches and make side-by-side comparisons. Moreover, we discuss the applications of compacted DNNs in various IoT applications and outline future directions.

show abstract

Section: A Knowledge From Logitsmentioning

confidence: 99%

Section: Referencesmentioning

confidence: 99%

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Zhang

Ying

Dai

et al. 2021

IEEE Internet Things J.

View full text Add to dashboard Cite

show abstract

“…In [33], the capacity gap between the large teacher model and the student has been investigated. It shows that the relationships between the architecture of the teacher and student model are very important.…”

Section: A Image Classificationmentioning

confidence: 99%

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Ghorbani,

Fooladgar,

Kasaei

2020

Preprint

View full text Add to dashboard Cite

Deep neural network architectures have attained remarkable improvements in scene understanding tasks (such as image classification and semantic segmentation). Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods have been proposed to diminish the heavy computational burden and memory consumption. Among them, the pruning and quantizing methods exhibit a critical drop in performances by compressing the model parameters. While the knowledge distillation methods improve the performance of compact models by focusing on training lightweight networks with the supervision of cumbersome networks. In the proposed method, the knowledge distillation has been performed within the network by constructing multiple branches over the primary stream of the model, known as the self-distillation method. Therefore, the ensemble of sub-neural network models has been proposed to transfer the knowledge among themselves with the knowledge distillation policies as well as an adversarial learning strategy. Hence, The proposed ensemble of sub-models is trained against a discriminator model adversarially. Besides, their knowledge is transferred within the ensemble by four different loss functions. The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process. Extensive experimental results on the main challenging datasets show that the proposed network outperforms the primary model in terms of accuracy at the same number of parameters and computational cost. The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods. The effectiveness of the proposed models has also been illustrated in the encoder-decoder model.

show abstract

“…Deep neural network (DNN)-driven algorithms now stand as the state of the art in a variety of domains, from perceptual tasks such as computer vision, speech and language processing to, more recently, control tasks such as robotics (Mirzadeh et al 2019), (Bastani, Pu, and Solar-Lezama 2018). Nevertheless, there is often reason to avoid direct use of DNNs.…”

Section: Introductionmentioning

confidence: 99%

“…For example, the training from scratch or hyperparameter tuning of such networks can be prohibitively expensive or time consuming (Schmitt et al 2018). For some applications, the size or complexity of such DNNs precludes their use in real time, or employment in edge devices with limited processing resources (Chen et al 2017), (Mirzadeh et al 2019). In other areas such as flight control or self driving cars, DNNs are sidelined (at least for mass deployment) by their opaqueness or lack of decision making interpretability (Bastani, Kim, and Bastani 2017), (Hind et al 2019).…”

Section: Introductionmentioning

confidence: 99%

Designing Interpretable Approximations to Deep Reinforcement Learning

Dahlin¹,

Kalagarla²,

Naik³

et al. 2020

Preprint

View full text Add to dashboard Cite

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or requirements such as verifiable safety guarantees, it may not be feasible to actually use such high-performing DNNs in practice. Many techniques have been developed in recent years to compress or distill complex DNNs into smaller, faster or more understandable models and controllers. This work seeks to provide a quantitative framework with metrics to systematically evaluate the outcome of such conversion processes, and identify reduced models that not only preserve a desired performance level, but also, for example, succinctly explain the latent knowledge represented by a DNN. We illustrate the effectiveness of the proposed approach on the evaluation of decision tree variants in the context of benchmark reinforcement learning tasks.

show abstract

Improved Knowledge Distillation via Teacher Assistant

Cited by 8 publications

References 0 publications

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Designing Interpretable Approximations to Deep Reinforcement Learning

Contact Info

Product

Resources

About