FedBABU: Towards Enhanced Representation for Federated Image Classification

Oh, Jaehoon; Kim, Sangmook; Yun, Se-Young

doi:10.48550/arxiv.2106.06042

Cited by 13 publications

(28 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We focus on the multi-task linear representation learning setting [35], which has become popular in recent years as it is an expressive but tractable nonconvex setting for studying the sample-complexity benefits of learning representations and the representation learning abilities of popular algorithms in data heterogeneous settings [11,13,56,26,54,12,52]. Remarkably, our study of FedAvg reveals that it can learn an effective representation even though it was not designed for this goal, unlike a variety of personalized FL methods specifically tailored for representation learning [11,32,2,41].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

Collins¹,

Hassani²,

Mokhtari³

et al. 2022

Preprint

View full text Add to dashboard Cite

The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first such result for any setting. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.

show abstract

Section: Related Workmentioning

confidence: 99%

“…and A 3,t,i (τ ),(45) follows by the fact that (I d − Bt,i,s−1 B t,i,s−1 )B * = dist(B t,i,s , B * ) ≤ 1.1 dist t by A 4,t,i (τ ). For the second term in(41), note that |ω…”

mentioning

confidence: 99%

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

Collins¹,

Hassani²,

Mokhtari³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To evaluate the personalized accuracy, each client has the same label distribution on local training and test data. Referred to Oh et al [49], we control FL environments with following hyperparameters: client fraction ratio f , local epochs τ , shards per user s, and Dirichlet concentration parameter β. f is the number of participating clients out of the total number of clients in every round and a small f is natural in the FL settings because the total number of clients is numerous. We use the linear warmup learning rate scheduler until 20 rounds, and after warm-up, the learning rate is scheduled via cosine learning rate scheduler that is initialized with 0.1.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…We use the linear warmup learning rate scheduler until 20 rounds, and after warm-up, the learning rate is scheduled via cosine learning rate scheduler that is initialized with 0.1. Other settings not mentioned are followed by Oh et al [49]. For the FedSup training, we use the number M of randomly sampled child model as equal to 3 and appy the inplace distillation.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…al. [49]. Such personalization evaluation of each client is measured over two folds: (1) the transmitted global model is evaluated on each client having their own test dataset D ts i (referred to as the initial accuracy), (2) each local model is fine-tuned on its own training dataset D tr i with the fine-tuning epochs of τ f ; such personalized model is then evaluated on its own test dataset of each client D ts i (referred to as the personalized accuracy).…”

Section: Experimental Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Supernet Training for Federated Image Classification under System Heterogeneity

Kim¹,

Yun²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Efficient deployment of deep neural networks across many devices and resource constraints, especially on edge devices, is one of the most challenging problems in the presence of data-privacy preservation issues. Conventional approaches have evolved to either improve a single global model while keeping each local training data decentralized (i.e., data-heterogeneity) or to train a once-for-all network that supports diverse architectural settings to address heterogeneous systems equipped with different computational capabilities (i.e., model-heterogeneity). However, little research has considered both directions simultaneously. In this work, we propose a novel framework to consider both scenarios, namely Federation of Supernet Training (FedSup), where clients send and receive a supernet whereby it contains all possible architectures sampled from itself. It is inspired by how averaging parameters in the model aggregation stage of Federated Learning (FL) is similar to weight-sharing in supernet training. Specifically, in the FedSup framework, a weight-sharing approach widely used in the training single shot model is combined with the averaging of Federated Learning (FedAvg). Under our framework, we present an efficient algorithm (E-FedSup) by sending the submodel to clients on the broadcast stage to reduce communication costs and training overhead. We demonstrate several strategies to enhance supernet training in the FL environment and conduct extensive empirical evaluations. The resulting framework is shown to pave the way for the robustness of both data-and model-heterogeneity on several standard benchmarks.Recent FL works have been evolving into designing new objective functions for the aggregation of each model [1; 25; 34; 60; 13; 71; 31], using auxiliary data in the center server [39; 73], encoding the weight for an efficient communication stage [63; 23; 65], or recruiting helpful clients for more accurate global model [35; 8; 48]. On the other side, there has been tremendous recent interest in deploying the FL algorithms for real-world applications such as mobile devices and the Internet of Preprint. Under review.

show abstract