2020
DOI: 10.48550/arxiv.2005.00797
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-consensus Decentralized Accelerated Gradient Descent

Abstract: This paper considers the decentralized optimization problem, which has applications in large scale machine learning, sensor networks, and control theory. We propose a novel algorithm that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
32
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(32 citation statements)
references
References 31 publications
0
32
0
Order By: Relevance
“…• Acceleration over mesh networks: Given the focus of this work, we comment next only distributed algorithms over mesh networks employing some form of acceleration and provably convergent-they are summarized in Table 1. Although substantially different-some are primal [Ye et al, 2020a, Ye et al, 2020b, Li and Lin, 2020, Rogozin et al, 2020 others are dual or penalty-based [Scaman et al, 2017, Uribe et al, 2020, Li et al, 2018 methods, and applicable to special instances of (P) (mainly with r = 0) and subject to special design constraints (e.g., positive semidefinite gossip matrix)-they all achieve linear convergence rate, with communication complexity scaling some with √ κ ℓ (κ ℓ = L mx /µ mn is the "local" condition number) and others with √ κ (κ = L/µ is the condition number of f ). Note that in general κ ≪ κ ℓ ; hence the latter group is preferable to the former.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…• Acceleration over mesh networks: Given the focus of this work, we comment next only distributed algorithms over mesh networks employing some form of acceleration and provably convergent-they are summarized in Table 1. Although substantially different-some are primal [Ye et al, 2020a, Ye et al, 2020b, Li and Lin, 2020, Rogozin et al, 2020 others are dual or penalty-based [Scaman et al, 2017, Uribe et al, 2020, Li et al, 2018 methods, and applicable to special instances of (P) (mainly with r = 0) and subject to special design constraints (e.g., positive semidefinite gossip matrix)-they all achieve linear convergence rate, with communication complexity scaling some with √ κ ℓ (κ ℓ = L mx /µ mn is the "local" condition number) and others with √ κ (κ = L/µ is the condition number of f ). Note that in general κ ≪ κ ℓ ; hence the latter group is preferable to the former.…”
Section: Related Workmentioning
confidence: 99%
“…OPAPC , Accelerated Dual Ascent [Uribe et al, 2020, Alg. 3], APM-C [Li et al, 2018], Mudag [Ye et al, 2020a], Accelerated EXTRA [Li and Lin, 2020], DAccGD [Rogozin et al, 2020], and DPAG [Ye et al, 2020b]. L (resp.…”
Section: Contributionsmentioning
confidence: 99%
“…It was shown in [46] that to obtain ǫ-optimal solutions, the gradient computation complexity is lower bounded by O √ κ log 1 ǫ , and the communication complexity is lower bounded by O κ θ log 1 ǫ . To obtain better complexities, many accelerated decentralized gradient-type methods have been developed (e.g., [11,12,16,18,20,21,22,23,24,38,41,42,43,46,54,57,61,62]). There exist dual-based methods such as [46] that achieve optimal complexities.…”
mentioning
confidence: 99%
“…In this paper, we focus on dual-free methods or gradient-type methods only. Some algorithms, for instance [16,22,23,41,42,61,62], rely on inner loops to guarantee desirable convergence rates. However, inner loops place a larger communication burden [24,38] which may limit the applications of these methods, since communication has often been recognized as the major bottleneck in distributed or decentralized optimization.…”
mentioning
confidence: 99%
See 1 more Smart Citation