HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication

Wang, Dongxia; Lei, Yongmei; Xie, Jinyang; Wang, Guozheng

doi:10.1007/s11227-020-03590-7

Cited by 6 publications

(2 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many methods have been proposed to deal with these challenges. Hasanov et al [6] utilize the hierarchical idea to design the MPI reduction algorithm, and Wang et al [7] design an asynchronous ADMM algorithm based on a hierarchical view. Xie et al [8] design a parameter synchronization architecture for the ADMM algorithm that combines a hierarchical architecture with Ring All-Reduce.…”

Section: ) Minmentioning

confidence: 99%

See 1 more Smart Citation

A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce

et al. 2023

Self Cite

View full text Add to dashboard Cite

Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a grouping AllReduce method based on the two-dimensional torus topology. This method synchronizes the model parameters by grouping and makes full use of bandwidth. Secondly, we propose a distributed algorithm, 2D-TGA-ADMM, which combines the 2D-TGA with the alternating direction method of multipliers (ADMM). It focuses on sub-model training and reduces the wait time among workers in the synchronization process. Finally, experimental results on the Tianhe-2 supercomputing platform show that compared with the $${\mathtt {MPI\_Allreduce}}$$ MPI _ Allreduce , the 2D-TGA could shorten the synchronization wait time by $$33\%$$ 33 % .

show abstract

Section: ) Minmentioning

confidence: 99%

“…Thus, we take (x i + i ) as a whole and define it as w i , as shown in Equ. (7) Communication topology is an important factor that affects the scalability of distributed optimization algorithms. In order to minimize the synchronization time of model (6a)…”

Section: Distributed Algorithm 2d-tga-admmmentioning

confidence: 99%