2020
DOI: 10.48550/arxiv.2012.00632
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Communication-Efficient Federated Distillation

Abstract: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learnin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(27 citation statements)
references
References 33 publications
0
27
0
Order By: Relevance
“…Jeong et. al and subsequent work (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a) focus on this aspect. These methods however are computationally more expensive for the resource constrained clients, as distillation needs to be performed locally and perform worse than parameter averaging based training after the same number of communication rounds.…”
Section: A Extended Related Work Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Jeong et. al and subsequent work (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a) focus on this aspect. These methods however are computationally more expensive for the resource constrained clients, as distillation needs to be performed locally and perform worse than parameter averaging based training after the same number of communication rounds.…”
Section: A Extended Related Work Discussionmentioning
confidence: 99%
“…Federated Distillation (FD) algorithms, which leverage these distillation techniques to aggregate the client knowledge, are recently gaining popularity, because they outperform conventional parameter averaging based FL methods (Lin et al, 2020;Chen & Chao, 2020) like FEDAVG or FedPROX (McMahan et al, 2017Li et al, 2020a) and allow clients to train heterogeneous model architectures (Li & Wang, 2019;Chang et al, 2019;Li et al, 2021). FD methods can furthermore reduce communication overhead (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a), by exploiting the fact that distillation requires only the communication of model predictions instead of full models. In contrast to centralized distillation, where training and distillation data usually coincide, FD makes no restrictions on the auxiliary distillation data 1 , making it widely applicable.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The paper [53] assumes that part of clients have unsupervised data, and trains a convergent model at the server to label them. The paper [54] considers using shared unlabeled data for Federated Distillation [55,56]. Another related work [57] trains and aggregates the model parameters of the labeled server, and unlabeled clients in parallel.…”
Section: Related Workmentioning
confidence: 99%
“…The scholars then focus to perform the model compression while still following the federated learning paradigm. The knowledge distillation then has been applied to transfer local model information while keeping the model size small at the local devices in [48]. It utilises knowledge distillation instead of model parameter fusion to update parameters.…”
Section: Classificationmentioning
confidence: 99%