A Field Guide to Federated Optimization

Wang, Jianyu; Charles, Zachary; Xu, Zheng; Joshi, Gauri; McMahan, H. Brendan; Arcas, Blaise Agüera y; Al-Shedivat, Maruan; Andrew, Galen; Avestimehr, Salman; Daly, Katharine; Data, Deepesh; Diggavi, Suhas; Eichner, Hubert; Gadhikar, Advait; Garrett, Zachary; Girgis, Antonious M.; Hanzely, Filip; Hard, Andrew; He, Chaoyang; Horváth, Samuel; Huo, Zhouyuan; Ingerman, Alex; Jäggi, Martin; Javidi, Tara; Kairouz, Peter; Kale, Satyen; Karimireddy, Sai Praneeth; Konečný, Jakub; Koyejo, Sanmi; Li, Tian; Liu, Luyang; Mohri, Mehryar; Hu, Qi; Reddi, Sashank J.; Richtárik, Peter; Singhal, Karan; Smith, Virginia; Soltanolkotabi, Mahdi; Song, Weikang; Suresh, Ananda Theertha; Stich, Sebastian U.; Talwalkar, Ameet; Wang, Hongyi; Woodworth, Blake; Wu, Shanshan; Yu, Felix X.; Yuan, Honglin; Zaheer, Manzil; Mi, Zhang; Zhang, Tong; Chen, Zheng; Zhu, Chen; Zhu, Wennan

doi:10.48550/arxiv.2107.06917

Cited by 68 publications

(91 citation statements)

References 165 publications

(242 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Federated Learning. Federated learning (FL) distributes machine learning model to the resource-constrained edges from which data originate, emerged as a promising alternative machine learning paradigm [23,25,33,34]. FL enables a multitude of participants to construct a joint model without sharing their private training data [4,22,23,25].…”

Section: Related Workmentioning

confidence: 99%

“…Although there are various FL frameworks nowadays, a most general FL paradigm consists of the following steps: (1) the server sends the global model to selected clients in each communication round, (2) each selected client trains the local model with its private data, (3) the clients send their trained local models back to the server, and (4) the server aggregates the local models to update the * Under review global model and repeats the first step until the global model converges. However, the FL paradigm is still a general definition and would face many challenges in practice [9,34]. One of the urgent challenges of FL is heterogeneity that includes both data heterogeneity and system heterogeneity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

Yao¹,

Pan²,

Wan³

et al. 2021

Preprint

View full text Add to dashboard Cite

The underlying assumption of recent federated learning (FL) paradigms is that local models usually share the same network architecture as the global model, which becomes impractical for mobile and IoT devices with different setups of hardware and infrastructure. A scalable federated learning framework should address heterogeneous clients equipped with different computation and communication capabilities. To this end, this paper proposes FEDHM, a novel federated model compression framework that distributes the heterogeneous low-rank models to clients and then aggregates them into a global full-rank model. Our solution enables the training of heterogeneous local models with varying computational complexities and aggregates a single global model. Furthermore, FEDHM not only reduces the computational complexity of the device, but also reduces the communication cost by using low-rank models. Extensive experimental results demonstrate that our proposed FEDHM outperforms the current pruning-based FL approaches in terms of test Top-1 accuracy (4.6% accuracy gain on average), with smaller model size (1.5× smaller on average) under various heterogeneous FL settings.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

Yao¹,

Pan²,

Wan³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Other variants of FedAvg include letting clients run different number of steps per round, or average the client states nonuniformly. We refer readers to Wang et al (2021) for a more comprehensive survey of these extensions.…”

Section: Related Workmentioning

confidence: 99%

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

Glasgow

Yuan

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best known upper and lower bounds do not match, and it is not clear whether the existing analysis captures the capacity of the algorithm. In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable. Additionally, we establish a lower bound in a heterogeneous setting that nearly matches the existing upper bound. While our lower bounds show the limitations of FedAvg, under an additional assumption of third-order smoothness, we prove more optimistic state-of-the-art convergence results in both convex and non-convex settings. Our analysis stems from a notion we call iterate bias, which is defined by the deviation of the expectation of the SGD trajectory from the noiseless gradient descent trajectory with the same initialization. We prove novel sharp bounds on this quantity, and show intuitively how to analyze this quantity from a Stochastic Differential Equation (SDE) perspective. * Equal contribution. 1 We discuss other extensions of FedAvg in Section 1.1.

show abstract

“…The central server communicates with the clients to train a machine learning model using the local data stored on the clients. Federated learning is often modeled as a distributed optimization problem (Konečný et al, 2016a,b;McMahan et al, 2017;Kairouz et al, 2019;Wang et al, 2021). Let D be the entire dataset distributed across all N clients/devices/workers/machines, where each client i has a local dataset D i .…”

Section: Introductionmentioning

confidence: 99%

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Zhao¹,

Burlachenko²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, there exists client-variance in federated learning due to the total number of heterogeneous clients is usually very large and the server is unable to communicate with all clients in each communication round. In this paper, we address these two issues together by proposing compressed and client-variance reduced methods. Concretely, we introduce COFIG and FRECON, which successfully enjoy communication compression with client-variance reduction. The total communication round of COFIG is O() in the nonconvex setting, where N is the total number of clients, S is the number of communicated clients in each round, ǫ is the convergence error, and ω is the parameter for the compression operator. Besides, our FRECON can converge faster than COFIG in the nonconvex setting, and it converges with O( (1+ω)) communication rounds. In the convex setting, COFIG converges within the communication rounds O( (1+ω)), which is also the first convergence result for compression schemes that do not communicate with all the clients in each round. In sum, both COFIG and FRECON do not need to communicate with all the clients and provide first/faster convergence results for convex and nonconvex federated learning, while previous works either require full clients communication (thus not practical) or obtain worse convergence results.

show abstract

A Field Guide to Federated Optimization

Cited by 68 publications

References 165 publications

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Contact Info

Product

Resources

About