2020
DOI: 10.1109/mis.2020.2994942
|View full text |Cite
|
Sign up to set email alerts
|

Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Abstract: Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience replay mem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(11 citation statements)
references
References 4 publications
0
11
0
Order By: Relevance
“…Several recent studies investigate techniques to partition DNN models and distribute the processing load. As for distributed computing, Jeong et al [42] and Cha et al [43] discuss distributed training methodologies called federated distillation. Different from these studies, our focus is on split computing for efficient inference rather than training, and our proposed method, head network distillation, can be executed on a single machine.…”
Section: Related Workmentioning
confidence: 99%
“…Several recent studies investigate techniques to partition DNN models and distribute the processing load. As for distributed computing, Jeong et al [42] and Cha et al [43] discuss distributed training methodologies called federated distillation. Different from these studies, our focus is on split computing for efficient inference rather than training, and our proposed method, head network distillation, can be executed on a single machine.…”
Section: Related Workmentioning
confidence: 99%
“…The performance of FRD is evaluated in terms of the mission completion time, and is compared with two baseline distributed RL frameworks: PD [35] and federated reinforcement learning (FRL) that exchanges actor NN model parameters following the standard FL operations [5,6,7,8,36]. Each agent runs an A2C model comprising a pair of actor and critic NNs [37], each of which is a multi-layer perceptron (MLP) with 2 hidden layers.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…Lastly, in that each agent stores a pair of actor and critic NNs, there are three possibilities of exchanging: only actor NNs, critic NNs, or both actor and critic NNs across agents. As seen by several experiments [24,36], exchanging only actor NNs, i.e., policy NNs, achieves the convergence speed as fast as exchanging both actor and critic NNs, while saving the communication cost thanks to ignoring critic NNs. Hereafter we thus focus on an FRD implementation with the experience memory constructed by the actor NN outputs.…”
Section: Federated Reinforcement Distillation With Proxy Experience M...mentioning
confidence: 99%
See 1 more Smart Citation
“…This naive implementation of FL may cause excessive communication overheads between agents and the central server. Recent studies [6], [7] which try to combine distributed DRL with FL mostly focus on improving the involved agents' capability and their collaboration efficiency, while these excessive communication overheads are ignored. In this paper, we introduce the periodic averaging method in FL to alleviate this problem [8], in which agents are allowed to perform several local updates to the model within a period before their local gradients are transmitted to the central server for averaging.…”
Section: Introductionmentioning
confidence: 99%