2019
DOI: 10.48550/arxiv.1902.07848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Abstract: Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on largescale data and complex models. As data are distributed from cloudcentric to edge nodes, a big challenge for distributed machine learning systems is how to handle native and natural non-independent and identically distributed (non-IID) data for training. Previous asynchronous training methods do not have a satisfying performance on non-IID data because it would result in that the t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 15 publications
0
11
0
Order By: Relevance
“…On the server side, Li et al [15] applied momentum uniformly to the gradients of all clients to stabilize the training process under a non-IID scenario. However, collecting gradients from clients might require more frequent communications than collecting models from clients.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…On the server side, Li et al [15] applied momentum uniformly to the gradients of all clients to stabilize the training process under a non-IID scenario. However, collecting gradients from clients might require more frequent communications than collecting models from clients.…”
Section: Related Workmentioning
confidence: 99%
“…To our best knowledge, there is no gold standard for evaluating federated algorithms. Generally, there are 3 ways to split the data into training and test sets: splitting all data globally [7,15,26], splitting each client's local data [1,4,24] and splitting clients into training/test groups [5,8,23]. In this work, we adopt the last strategy by assuming no local data can be collected by the server and the server can not manipulate the client's local data.…”
Section: Evaluation Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Though K-async FL can mitigate the stragglers effect and save total training time [23], there are still two obstacles to be overcome in practice. On the one hand, non-IID datasets generated at different FL clients can impact the model utility [30], [31]. On the other hand, stale gradients may harm the model utility, or even diverge the training process [32], [33].…”
Section: Introductionmentioning
confidence: 99%
“…The solutions to these two problems have been studied separately. For non-IID data, the essence of the existing solutions, e.g., momentum [31], [34] and variance reduction [35], is to fully utilize all available information for estimating the global data distribution. Hence, gradients from as many clients as possible need to be aggregated to make the aggregated gradients represent the overall data comprehensively.…”
Section: Introductionmentioning
confidence: 99%