Distributed dual averaging for convex optimization under communication delays

Tsianos, Konstantinos I.; Rabbat, Michael

doi:10.1109/acc.2012.6315289

Cited by 95 publications

(73 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [8] it is proven that restricting to doubly stochastic consensus protocols in distributed dual averaging is not necessary and it is still possible to converge to the optimum with a general row stochastic protocol P . However there are multiple reasons why using a row stochastic matrix may not be desirable.…”

Section: Push-sum Consensusmentioning

confidence: 99%

“…However there are multiple reasons why using a row stochastic matrix may not be desirable. The bias correction described in [8] requires knowledge of the stationary distribution of P in advance which is restrictive. Moreover, with a time-varying consensus protocol P (t), we may not even be able to specify the stationary distribution beyond its expectation and variance [9] or may only be able to achieve average consensus in expectation [10].…”

Section: Push-sum Consensusmentioning

confidence: 99%

“…The result holds for one directional communication using arbitrary column stochastic consensus protocols P that respect the structure of the strongly connected network. In contrast to [8], the bias in the optimization introduced by P is corrected without requiring knowledge of the stationary distribution of P or the network size. In the last part of the paper, we discuss how our convergence result can be extended to accommodate communication delays and a fully asynchronous version of the algorithm.…”

Section: Introductionmentioning

confidence: 99%

“…For example, not all (directed) networks admit a doubly stochastic matrix [6]. However, relinquishing double stochasticity can introduce bias in the optimization [7], [8]. Moreover, it is desirable to only rely on one directional communication between nodes because in a bi-directional case where each node blocks until it receives a response, deadlocks can occur when the network has cycles.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Push-Sum Distributed Dual Averaging for convex optimization

Tsianos

Lawlor

Rabbat

2012

2012 IEEE 51st IEEE Conference on Decision and Control (CDC)

254

164

View full text Add to dashboard Cite

Recently there has been a significant amount of research on developing consensus based algorithms for distributed optimization motivated by applications that vary from large scale machine learning to wireless sensor networks. This work describes and proves convergence of a new algorithm called Push-Sum Distributed Dual Averaging which combines a recent optimization algorithm [1] with a push-sum consensus protocol [2]. As we discuss, the use of push-sum has significant advantages. Restricting to doubly stochastic consensus protocols is not required and convergence to the true average consensus is guaranteed without knowing the stationary distribution of the update matrix in advance. Furthermore, the communication semantics of just summing the incoming information make this algorithm truly asynchronous and allow a clean analysis when varying intercommunication intervals and communication delays are modelled. We include experiments in simulation and on a small cluster to complement the theoretical analysis.

show abstract

Section: Push-sum Consensusmentioning

confidence: 99%

Section: Push-sum Consensusmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Push-Sum Distributed Dual Averaging for convex optimization

Tsianos

Lawlor

Rabbat

2012

2012 IEEE 51st IEEE Conference on Decision and Control (CDC)

254

164

View full text Add to dashboard Cite

show abstract

“…(Here by private cost f i (·) we mean that the function f i (·) is known only by node i.) Existing distributed (sub)gradient algorithms, e.g., the algorithm proposed in [2] and extended and analyzed in [3], [2], [4], [5], [6], [7], and the one proposed in [8] and extended and analyzed in [9], [10], converge slowly. For example, assuming possibly non-differentiable, convex f i 's, with bounded gradients over the constraint set, algorithm [2] with constant step size ↵, after k iterations, has the error in the cost O(↵+1/(↵k)), which, for the optimized ↵, gives O(1/✏ 2 ) convergence time.…”

Section: Introductionmentioning

confidence: 99%

Distributed Nesterov-like gradient algorithms

Jakovetić

Moura

2012

2012 IEEE 51st IEEE Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Abstract-In classical, centralized optimization, the Nesterov gradient algorithm reduces the number of iterations to produce an ✏-accurate solution (in terms of the cost function) with respect to ordinary gradient from O(1/✏) to O(1/ p ✏). This improvement is achieved on a class of convex functions with Lipschitz continuous first derivative, and it comes at a very small additional computational cost per iteration. In this paper, we consider distributed optimization, where nodes in the network cooperatively minimize the sum of their private costs subject to a global constraint. To solve this problem, recent literature proposes distributed (sub)gradient algorithms, that are attractive due to computationally inexpensive iterations, but that converge slowly-the ✏ error is achieved in O(1/✏ 2 ) iterations. Here, building from the Nesterov gradient algorithm, we present a distributed, constant step size, Nesterov-like gradient algorithm that converges much faster than existing distributed (sub)gradient methods, with zero additional communications and very small additional computations per iteration k. We show that our algorithm converges to a solution neighborhood, such that, for a convex compact constraint set and optimized stepsize, the convergence time is O(1/✏). We achieve this on a class of convex, coercive, continuously differentiable private costs with Lipschitz first derivative. We derive our algorithm through a useful penalty, network's Laplacian matrix-based reformulation of the original problem (referred to as the clone problem) -the proposed method is precisely the Nesterovgradient applied on the clone problem. Finally, we illustrate the performance of our algorithm on distributed learning of a classifier via logistic loss.

show abstract

Distributed zero‐gradient‐sum algorithm for convex optimization with time‐varying communication delays and switching networks

Guo

Chen

2018

Intl J Robust & Nonlinear

View full text Add to dashboard Cite

Summary The distributed convex optimization problem subject to time‐varying communication delays and switching network topologies is addressed in this paper. Based on continuous‐time Zero‐Gradient‐Sum scheme, the novel distributed algorithms are proposed to minimize the global objective function which is composed of a sum of strictly convex local cost functions. In the fixed network topology case, by constructing a new Lyapunov‐Krasovskii function, two explicit sufficient conditions for the maximum admissible time delay are derived to guarantee that all agents' states converge to the optimal solution. In the switching network topology case, the stability condition is derived by the common Lyapunov function theory. In addition, two sufficient conditions about the maximum admissible time delays are also derived for the fixed and switching weight‐balanced network topologies, respectively. Several simulation tests are used to illustrate the effectiveness of our obtained theoretical results.

show abstract

Distributed dual averaging for convex optimization under communication delays

Cited by 95 publications

References 9 publications

Push-Sum Distributed Dual Averaging for convex optimization

Push-Sum Distributed Dual Averaging for convex optimization

Distributed Nesterov-like gradient algorithms

Distributed zero‐gradient‐sum algorithm for convex optimization with time‐varying communication delays and switching networks

Contact Info

Product

Resources

About