Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Li, Chengjie; Li, Ruixuan; Wang, Haozhao; Li, Yuhua; Zhou, Pan; Guo, Song; Li, Keqin

doi:10.48550/arxiv.1902.07848

Cited by 5 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the server side, Li et al [15] applied momentum uniformly to the gradients of all clients to stabilize the training process under a non-IID scenario. However, collecting gradients from clients might require more frequent communications than collecting models from clients.…”

Section: Related Workmentioning

confidence: 99%

“…To our best knowledge, there is no gold standard for evaluating federated algorithms. Generally, there are 3 ways to split the data into training and test sets: splitting all data globally [7,15,26], splitting each client's local data [1,4,24] and splitting clients into training/test groups [5,8,23]. In this work, we adopt the last strategy by assuming no local data can be collected by the server and the server can not manipulate the client's local data.…”

Section: Evaluation Setupmentioning

confidence: 99%

“…Our work focuses on mitigating the impact of non-IID client data distributions. Many existing works [15,16,26,29,31] adopt the FedAvg [20] framework and applied various strategies to handle non-IID data. In this work, we propose a new framework of federated learning with delayed aggregations.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Aggregation Delayed Federated Learning

Xue¹,

Klabjan²,

Luo³

2021

Preprint

View full text Add to dashboard Cite

Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while keeping data on their own devices. The heterogeneity of client datasets is one of the most important challenges of federated learning algorithms. Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-IID data. Many existing works on handling non-IID data adopt the same aggregation framework as FedAvg and focus on improving model updates either on the server side or on clients. In this work, we tackle this challenge in a different view by introducing redistribution rounds that delay the aggregation. We perform experiments on multiple tasks and show that the proposed framework significantly improves the performance on non-IID data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Evaluation Setupmentioning

confidence: 99%

See 1 more Smart Citation

Aggregation Delayed Federated Learning

Xue¹,

Klabjan²,

Luo³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Though K-async FL can mitigate the stragglers effect and save total training time [23], there are still two obstacles to be overcome in practice. On the one hand, non-IID datasets generated at different FL clients can impact the model utility [30], [31]. On the other hand, stale gradients may harm the model utility, or even diverge the training process [32], [33].…”

Section: Introductionmentioning

confidence: 99%

“…The solutions to these two problems have been studied separately. For non-IID data, the essence of the existing solutions, e.g., momentum [31], [34] and variance reduction [35], is to fully utilize all available information for estimating the global data distribution. Hence, gradients from as many clients as possible need to be aggregated to make the aggregated gradients represent the overall data comprehensively.…”

Section: Introductionmentioning

confidence: 99%

Towards Efficient and Stable K-Asynchronous Federated Learning with Unbounded Stale Gradients on Non-IID Data

Zhou,

Li,

Ren

et al. 2022

Preprint

View full text Add to dashboard Cite

Federated learning (FL) is an emerging privacy-preserving paradigm that enables multiple participants collaboratively to train a global model without uploading raw data. Considering heterogeneous computing and communication capabilities of different participants, asynchronous FL can avoid the stragglers effect in synchronous FL and adapts to scenarios with vast participants. Both staleness and non-IID data in asynchronous FL would reduce the model utility. However, there exists an inherent contradiction between the solutions to the two problems. That is, mitigating the staleness requires to select less but consistent gradients while coping with non-IID data demands more comprehensive gradients. To address the dilemma, this paper proposes a two-stage weighted K asynchronous FL with adaptive learning rate (WKAFL). By selecting consistent gradients and adjusting learning rate adaptively, WKAFL utilizes stale gradients and mitigates the impact of non-IID data, which can achieve multifaceted enhancement in training speed, prediction accuracy and training stability. We also present the convergence analysis for WKAFL under the assumption of unbounded staleness to understand the impact of staleness and non-IID data. Experiments implemented on both benchmark and synthetic FL datasets show that WKAFL has better overall performance compared to existing algorithms.

show abstract

Two-Dimensional Learning Rate Decay: Towards Accurate Federated Learning with Non-IID Data

Chen

et al. 2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Cited by 5 publications

References 15 publications

Aggregation Delayed Federated Learning

Aggregation Delayed Federated Learning

Towards Efficient and Stable K-Asynchronous Federated Learning with Unbounded Stale Gradients on Non-IID Data

Two-Dimensional Learning Rate Decay: Towards Accurate Federated Learning with Non-IID Data

Contact Info

Product

Resources

About