2020
DOI: 10.48550/arxiv.2011.02367
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Federated Knowledge Distillation

Abstract: Machine learning is one of the key building blocks in 5G and beyond [1,2,3] spanning a broad range of applications and use cases. In the context of mission-critical applications [2,4], machine learning models should be trained with fresh data samples that are generated by and dispersed across edge devices (e.g., phones, cars, access points, etc.). Collecting these raw data incurs significant communication overhead, which may violate data privacy. In this regard, federated learning (FL) [5,6,7,8] is a promising… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 26 publications
0
24
0
Order By: Relevance
“…Jeong et. al and subsequent work (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a) focus on this aspect. These methods however are computationally more expensive for the resource constrained clients, as distillation needs to be performed locally and perform worse than parameter averaging based training after the same number of communication rounds.…”
Section: A Extended Related Work Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Jeong et. al and subsequent work (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a) focus on this aspect. These methods however are computationally more expensive for the resource constrained clients, as distillation needs to be performed locally and perform worse than parameter averaging based training after the same number of communication rounds.…”
Section: A Extended Related Work Discussionmentioning
confidence: 99%
“…Federated Distillation (FD) algorithms, which leverage these distillation techniques to aggregate the client knowledge, are recently gaining popularity, because they outperform conventional parameter averaging based FL methods (Lin et al, 2020;Chen & Chao, 2020) like FEDAVG or FedPROX (McMahan et al, 2017Li et al, 2020a) and allow clients to train heterogeneous model architectures (Li & Wang, 2019;Chang et al, 2019;Li et al, 2021). FD methods can furthermore reduce communication overhead (Jeong et al, 2018;Itahara et al, 2020;Seo et al, 2020;Sattler et al, 2020a), by exploiting the fact that distillation requires only the communication of model predictions instead of full models. In contrast to centralized distillation, where training and distillation data usually coincide, FD makes no restrictions on the auxiliary distillation data 1 , making it widely applicable.…”
Section: Related Workmentioning
confidence: 99%
“…The output of each model is regularized to be similar to the ensemble of predictions from all models via a distillation loss. The idea of codistillation is used by several methods to reduce communication cost of federated learning (Sui et al, 2020;Li and Wang, 2019;Seo et al, 2020; Lin et al, 2020b;Sun and Lyu, 2020). For example, Sui et al (2020) proposed a federated ensemble distillation approach for medical relation extraction.…”
Section: Communication Efficient Flmentioning
confidence: 99%
“…The NTK framework has been mostly used for convergence analyses in FL. Seo et al (2020) studied two knowledge distillation methods in FL and compared their convergence properties based on the neural network function evolution in the NTK regime. incorporated batch normalization layers to local models, and provided theoretical justification for its faster convergence by studying the minimum nonnegative eigenvalue of the tangent kernel matrix.…”
Section: Related Workmentioning
confidence: 99%