2020
DOI: 10.48550/arxiv.2012.04061
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Faster Non-Convex Federated Learning via Global and Local Momentum

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Despite its success, FedAvg suffers from the large heterogeneity (non-iid-ness) in the data presented on the different clients, causing drift in each client's updates and resulting in slow and unstable convergence (Karimireddy et al, 2020b). To address this issue, a new line of study has been suggested lately that either simulates the distribution of the whole dataset using preassigned weights of clients (Wang et al, 2020;Reisizadeh et al, 2020;Mohri et al, 2019;Li et al, 2020a) or adopts variance reduction methods (Karimireddy et al, 2020b;a;Das et al, 2020;Haddadpour et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Despite its success, FedAvg suffers from the large heterogeneity (non-iid-ness) in the data presented on the different clients, causing drift in each client's updates and resulting in slow and unstable convergence (Karimireddy et al, 2020b). To address this issue, a new line of study has been suggested lately that either simulates the distribution of the whole dataset using preassigned weights of clients (Wang et al, 2020;Reisizadeh et al, 2020;Mohri et al, 2019;Li et al, 2020a) or adopts variance reduction methods (Karimireddy et al, 2020b;a;Das et al, 2020;Haddadpour et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…Local methods. Several recent papers have studied local algorithms combined with variance reduction technique (Sharma et al, 2019;Das et al, 2020;Khanduri et al, 2021;Karimireddy et al, 2020a). Sharma et al (2019) have proposed a local variant of SPIDER (Fang et al, 2018) and shown that the proposed algorithm achieves the optimal total computational complexity.…”
Section: Related Workmentioning
confidence: 99%
“…several approaches use control variates [10,11,18,20] or global gradient momentum [30] to reduce biases in client update. [5,13] apply STORM algorithm [4] to reduce variance caused by both server-level and client-level SGD procedures. Another way to de-bias client updates is to estimate the global posterior using local posterior sampling by running Markov chain Monte Carlo (MCMC) on client side [2].…”
Section: Related Workmentioning
confidence: 99%
“…To further validate the effectiveness of the proposed method in handling client heterogeneity, we perform experiments with a large number of clients, which are more realistic federated learning scenarios. In this setting, since the total number of clients is increased by 5 [21] 64.54 1000+ (> 1.43×) FedProx [17] 65.47 1000+ (> 1.43×) FedAvgm [8] 63.73 1000+ (> 1.43×) FedAdam [23] 69.29 1000+ (> 1.43×) FedDyn [1] 72.18 854 (1.22×) FedCM [30] 55.03 1000+ (> 1.43×) FedAGM (ours)…”
Section: Evaluation On a Large Number Of Clientsmentioning
confidence: 99%
See 1 more Smart Citation