Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Cha, Han; Park, Jihong; Kim, Hyesung; Bennis, Mehdi; Kim, Seong-Lyun

doi:10.1109/mis.2020.2994942

Cited by 27 publications

(11 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several recent studies investigate techniques to partition DNN models and distribute the processing load. As for distributed computing, Jeong et al [42] and Cha et al [43] discuss distributed training methodologies called federated distillation. Different from these studies, our focus is on split computing for efficient inference rather than training, and our proposed method, head network distillation, can be executed on a single machine.…”

Section: Related Workmentioning

confidence: 99%

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

et al. 2020

View full text Add to dashboard Cite

As the complexity of Deep Neural Network (DNN) models increases, their deployment on mobile devices becomes increasingly challenging, especially in complex vision tasks such as image classification. Many of recent contributions aim either to produce compact models matching the limited computing capabilities of mobile devices or to offload the execution of such burdensome models to a compute-capable device at the network edge-the edge servers. In this paper, we propose to modify the structure and training process of DNN models for complex image classification tasks to achieve in-network compression in the early network layers. Our training process stems from knowledge distillation, a technique that has been traditionally used to build small-student-models mimicking the output of larger-teacher-models. Here, we adopt this idea to obtain aggressive compression while preserving accuracy. Our results demonstrate that our approach is effective for state-of-the-art models trained over complex datasets, and can extend the parameter region in which edge computing is a viable and advantageous option. Additionally, we demonstrate that in many settings of practical interest we reduce the inference time with respect to specialized models such as MobileNet v2 executed at the mobile device, while improving accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The performance of FRD is evaluated in terms of the mission completion time, and is compared with two baseline distributed RL frameworks: PD [35] and federated reinforcement learning (FRL) that exchanges actor NN model parameters following the standard FL operations [5,6,7,8,36]. Each agent runs an A2C model comprising a pair of actor and critic NNs [37], each of which is a multi-layer perceptron (MLP) with 2 hidden layers.…”

Section: Experiments and Discussionmentioning

confidence: 99%

“…Lastly, in that each agent stores a pair of actor and critic NNs, there are three possibilities of exchanging: only actor NNs, critic NNs, or both actor and critic NNs across agents. As seen by several experiments [24,36], exchanging only actor NNs, i.e., policy NNs, achieves the convergence speed as fast as exchanging both actor and critic NNs, while saving the communication cost thanks to ignoring critic NNs. Hereafter we thus focus on an FRD implementation with the experience memory constructed by the actor NN outputs.…”

Section: Federated Reinforcement Distillation With Proxy Experience M...mentioning

confidence: 99%

“…To obviate the aforementioned problems, by leveraging FD, we introduce federated reinforcement distillation (FRD) [24,36], a communication-efficient and privacy-preserving distributed RL framework based on a proxy experience memory. In FRD, each agent stores a local proxy experience memory that consists of a set of pre-arranged proxy states and locally averaged policies.…”

Section: 4mentioning

confidence: 99%

See 1 more Smart Citation

Federated Knowledge Distillation

Seo¹,

Park²,

Oh³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Machine learning is one of the key building blocks in 5G and beyond [1,2,3] spanning a broad range of applications and use cases. In the context of mission-critical applications [2,4], machine learning models should be trained with fresh data samples that are generated by and dispersed across edge devices (e.g., phones, cars, access points, etc.). Collecting these raw data incurs significant communication overhead, which may violate data privacy. In this regard, federated learning (FL) [5,6,7,8] is a promising communication-efficient and privacy-preserving solution that periodically exchanges local model parameters, without sharing raw data. However, exchanging model parameters is extremely costly under modern deep neural network (NN) architectures that often have a huge number of model parameters. For instance, MobileBERT is a state-of-the-art NN architecture for on-device natural language processing (NLP) tasks, with 25 million parameters corresponding to 96 MB [9]. Training such a model by exchanging the 96 MB payload per communication round is challenging particularly under limited wireless resources.The aforementioned limitation of FL has motivated to the development of federated distillation (FD) [10] based on exchanging only the local model outputs whose dimensions are commonly much smaller than the model sizes (e.g., 10 labels in the MNIST dataset). To illustrate, as shown in Figure 1.1, consider a 2label classification example wherein each worker in FD runs local iterations with samples having either blue or yellow ground-truth label. For each training sample, the worker generates its prediction output distribution, termed a local logit that is a softmax output vector of the last NN layer activations (e.g., {blue, yellow} = {0.7, 0.3} for a blue sample). At a regular interval, the generated local logits of the worker are averaged per ground-truth label, and uploaded to a parameter server for aggregating and globally averaging the local average logits across workers per a H. Seo was with the

show abstract

“…This naive implementation of FL may cause excessive communication overheads between agents and the central server. Recent studies [6], [7] which try to combine distributed DRL with FL mostly focus on improving the involved agents' capability and their collaboration efficiency, while these excessive communication overheads are ignored. In this paper, we introduce the periodic averaging method in FL to alleviate this problem [8], in which agents are allowed to perform several local updates to the model within a period before their local gradients are transmitted to the central server for averaging.…”

Section: Introductionmentioning

confidence: 99%

Communication-Efficient Consensus Mechanism for Federated Reinforcement Learning

Xu¹,

Li²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

The paper considers independent reinforcement learning (IRL) for multi-agent decision-making process in the paradigm of federated learning (FL). We show that FL can clearly improve the policy performance of IRL in terms of training efficiency and stability. However, since the policy parameters are trained locally and aggregated iteratively through a central server in FL, frequent information exchange incurs a large amount of communication overheads. To reach a good balance between improving the model's convergence performance and reducing the required communication and computation overheads, this paper proposes a system utility function and develops a consensus-based optimization scheme on top of the periodic averaging method, which introduces the consensus algorithm into FL for the exchange of a model's local gradients. This paper also provides novel convergence guarantees for the developed method, and demonstrates its superior effectiveness and efficiency in improving the system utility value through theoretical analyses and numerical simulation results.

show abstract

Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Cited by 27 publications

References 4 publications

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Federated Knowledge Distillation

Communication-Efficient Consensus Mechanism for Federated Reinforcement Learning

Contact Info

Product

Resources

About