Optimal Client Sampling for Federated Learning

Chen, Wenlin; Horváth, Samuel; Richtárik, Peter

doi:10.48550/arxiv.2010.13723

Cited by 24 publications

(40 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…7. However, Ranklist-Multi-UCB seems 2) Large number of clients (10) per round: For ten clients per round, Fig. 8 shows that the overall performance of all three policies is similar to that of FedAvg with five clients per round.…”

Section: Experiments Resultsmentioning

confidence: 92%

Multi-Model Federated Learning

Bhuyan,

Moharir

2022

Preprint

View full text Add to dashboard Cite

Federated learning is a form of distributed learning with the key challenge being the non-identically distributed nature of the data in the participating clients. In this paper, we extend federated learning to the setting where multiple unrelated models are trained simultaneously. Specifically, every client is able to train any one of M models at a time and the server maintains a model for each of the M models which is typically a suitably averaged version of the model computed by the clients. We propose multiple policies for assigning learning tasks to clients over time. In the first policy, we extend the widely studied FedAvg to multi-model learning by allotting models to clients in an i.i.d. stochastic manner. In addition, we propose two new policies for client selection in a multi-model federated setting which make decisions based on current local losses for each client-model pair. We compare the performance of the policies on tasks involving synthetic and real-world data and characterize the performance of the proposed policies. The key take-away from our work is that the proposed multi-model policies perform better or at least as good as single model training using FedAvg.

show abstract

Section: Experiments Resultsmentioning

confidence: 92%

Multi-Model Federated Learning

Bhuyan,

Moharir

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Partial participation is a necessity in the cross-device regime where the training is performed over a very large number of clients (i.e., M is very large) most of which will only participate in the entire training procedure at most once. Sampling of clients to form a cohort can be done adaptively so as to choose the most informative clients (Chen et al, 2020).…”

Section: Ingredients Of Successful Federated Learning Methodsmentioning

confidence: 99%

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

Malinovsky¹,

Mishchenko²,

Richtárik³

2022

Preprint

View full text Add to dashboard Cite

We present a theoretical study of server-side optimization in federated learning. Our results are the first to show that the widely popular heuristic of scaling the client updates with an extra parameter is very useful in the context of Federated Averaging (FedAvg) with local passes over the client data. Each local pass is performed without replacement using Random Reshuffling, which is a key reason we can show improved complexities. In particular, we prove that whenever the local stepsizes are small, and the update direction is given by FedAvg in conjunction with Random Reshuffling over all clients, one can take a big leap in the obtained direction and improve rates for convex, strongly convex, and non-convex objectives. In particular, in non-convex regime we get an enhancement of the rate of convergence from O ε −3 to O ε −2 . This result is new even for Random Reshuffling performed on a single node. In contrast, if the local stepsizes are large, we prove that the noise of client sampling can be controlled by using a small server-side stepsize. To the best of our knowledge, this is the first time that local steps provably help to overcome the communication bottleneck. Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning. Moreover, we consider a variant of our algorithm that supports partial client participation, which makes the method more practical.

show abstract

“…Specifically, clients with "important" data would have higher probabilities to be sampled in each round. For example, existing works use clients' local gradient information (e.g., [25]- [27]) or local losses (e.g., [28]) to measure the importance of clients' data. However, these schemes did not consider the speed of error convergence with respect to wall-clock time, especially the straggling effect due to heterogeneous transmission delays.…”

Section: Related Workmentioning

confidence: 99%

“…One effective way of speeding up the convergence with respect to the number of training rounds is to choose clients according to some sampling distribution where "important" clients have high probabilities [21]- [24]. For example, recent works adopted importance sampling approaches based on clients' statistical property [25]- [28]. However, their sampling schemes did not account for the heterogeneous physical time in each round, especially under straggling circumstances.…”

Section: Introductionmentioning

confidence: 99%

Tackling System and Statistical Heterogeneity for Federated Learning with Adaptive Client Sampling

Luo¹,

Xiao²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probabilities. Based on the bound, we analytically establish the relationship between the total learning time and sampling probabilities, which results in a non-convex optimization problem for training time minimization. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes. Notably, our scheme in hardware prototype spends 73% less time than the uniform sampling baseline for reaching the same target loss.3 We use wall-clock time to distinguish from the number of training rounds.

show abstract

Optimal Client Sampling for Federated Learning

Cited by 24 publications

References 19 publications

Multi-Model Federated Learning

Multi-Model Federated Learning

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

Tackling System and Statistical Heterogeneity for Federated Learning with Adaptive Client Sampling

Contact Info

Product

Resources

About