2022 IEEE 25th International Symposium on Real-Time Distributed Computing (ISORC) 2022
DOI: 10.1109/isorc52572.2022.9812837
|View full text |Cite
|
Sign up to set email alerts
|

LRP-based Policy Pruning and Distillation of Reinforcement Learning Agents for Embedded Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…We adopt Deep Q-Network (DQN) 6 as the DRL algorithm in this article, which is one of the most widely-used variants of DRL algorithms, but the techniques presented in this article are also applicable to any other DRL algorithm as long as the policy network is a CNN. This article is an extension of our conference publication, 25 with the following new contents: a new Figure 7 to measure the memory efficiency more precisely for different pruning rates; a new Section 4.2 with additional experimental results, to show that robust RL agents obtained by a robust training algorithm 26 can generally achieve better performance (higher average reward) after pruning than standard RL agents. This article is structured as follows: we first discuss background knowledge on DQN and LRP in Section 2; our approach of LRP-based policy pruning and distillation in Section 3; performance evaluation results in Section 4, including Section 4.1 for experiments with versus without fine-tuning, and Section 4.2 for experiments with robust versus non-robust models; and conclusions in Section 5.…”
Section: F I G U R E 1 Overview Of Our Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…We adopt Deep Q-Network (DQN) 6 as the DRL algorithm in this article, which is one of the most widely-used variants of DRL algorithms, but the techniques presented in this article are also applicable to any other DRL algorithm as long as the policy network is a CNN. This article is an extension of our conference publication, 25 with the following new contents: a new Figure 7 to measure the memory efficiency more precisely for different pruning rates; a new Section 4.2 with additional experimental results, to show that robust RL agents obtained by a robust training algorithm 26 can generally achieve better performance (higher average reward) after pruning than standard RL agents. This article is structured as follows: we first discuss background knowledge on DQN and LRP in Section 2; our approach of LRP-based policy pruning and distillation in Section 3; performance evaluation results in Section 4, including Section 4.1 for experiments with versus without fine-tuning, and Section 4.2 for experiments with robust versus non-robust models; and conclusions in Section 5.…”
Section: F I G U R E 1 Overview Of Our Approachmentioning
confidence: 99%
“…This article is an extension of our conference publication, 25 with the following new contents: a new Figure 7 to measure the memory efficiency more precisely for different pruning rates; a new Section 4.2 with additional experimental results, to show that robust RL agents obtained by a robust training algorithm 26 can generally achieve better performance (higher average reward) after pruning than standard RL agents.…”
Section: Introduction and Related Workmentioning
confidence: 99%