LRP-based Policy Pruning and Distillation of Reinforcement Learning Agents for Embedded Systems

Rui, Xiaoting; Luan, Siyu; Gu, Zonghua; Zhao, Qingling; Chen, Gang

doi:10.1109/isorc52572.2022.9812837

Cited by 4 publications

(2 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We adopt Deep Q-Network (DQN) 6 as the DRL algorithm in this article, which is one of the most widely-used variants of DRL algorithms, but the techniques presented in this article are also applicable to any other DRL algorithm as long as the policy network is a CNN. This article is an extension of our conference publication, 25 with the following new contents: a new Figure 7 to measure the memory efficiency more precisely for different pruning rates; a new Section 4.2 with additional experimental results, to show that robust RL agents obtained by a robust training algorithm 26 can generally achieve better performance (higher average reward) after pruning than standard RL agents. This article is structured as follows: we first discuss background knowledge on DQN and LRP in Section 2; our approach of LRP-based policy pruning and distillation in Section 3; performance evaluation results in Section 4, including Section 4.1 for experiments with versus without fine-tuning, and Section 4.2 for experiments with robust versus non-robust models; and conclusions in Section 5.…”

Section: F I G U R E 1 Overview Of Our Approachmentioning

confidence: 99%

“…This article is an extension of our conference publication, 25 with the following new contents: a new Figure 7 to measure the memory efficiency more precisely for different pruning rates; a new Section 4.2 with additional experimental results, to show that robust RL agents obtained by a robust training algorithm 26 can generally achieve better performance (higher average reward) after pruning than standard RL agents.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

LRP‐based network pruning and policy distillation of robust and non‐robust DRL agents for embedded systems

Luan

Rui

et al. 2022

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Reinforcement learning (RL) is an effective approach to developing control policies by maximizing the agent's reward. Deep reinforcement learning uses deep neural networks (DNNs) for function approximation in RL, and has achieved tremendous success in recent years. Large DNNs often incur significant memory size and computational overheads, which may impede their deployment into resource-constrained embedded systems. For deployment of a trained RL agent on embedded systems, it is necessary to compress the policy network of the RL agent to improve its memory and computation efficiency. In this article, we perform model compression of the policy network of an RL agent by leveraging the relevance scores computed by layer-wise relevance propagation (LRP), a technique for Explainable AI (XAI), to rank and prune the convolutional filters in the policy network, combined with fine-tuning with policy distillation. Performance evaluation based on several Atari games indicates that our proposed approach is effective in reducing model size and inference time of RL agents. We also consider robust RL agents trained with RADIAL-RL versus standard RL agents, and show that a robust RL agent can achieve better performance (higher average reward) after pruning than a standard RL agent for different attack strengths and pruning rates.

show abstract

Section: F I G U R E 1 Overview Of Our Approachmentioning

confidence: 99%

Section: Introduction and Related Workmentioning

confidence: 99%