Push–Pull Gradient Methods for Distributed Optimization in Networks

Pu, Shi; Shi, Wei; Xu, Jinming; Nedić, Angelia

doi:10.1109/tac.2020.2972824

Cited by 245 publications

(184 citation statements)

References 47 publications

Supporting

Mentioning

176

Contrasting

Unclassified

Order By: Relevance

“…For another example, in simulation-based optimization, the gradient estimation often incurs noise that can be due to various sources, such as modeling and discretization errors, incomplete convergence, and finite sample size for Monte-Carlo methods [22]. Distributed algorithms dealing with problem (1) have been studied extensively in the literature [56,36,37,28,19,20,52,13,46,34,45]. Recently, there has been considerable interest in distributed implementation of stochastic gradient algorithms [48,54,14,3,5,55,6,9,10,7,32,24,26,40,51,41,18].…”

Section: Scenarios In Which Problemmentioning

confidence: 99%

Distributed stochastic gradient tracking methods

2020

Self Cite

View full text Add to dashboard Cite

In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size n, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.

show abstract

Section: Scenarios In Which Problemmentioning

confidence: 99%

Distributed stochastic gradient tracking methods

2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is shown in [19] that the oracle complexity with doublystochastic weights is O(Q 2 log 1 ). Extensions of AB include: non-coordinated step-sizes and heavy-ball momentum [32]; time-varying graphs [36], [37]; analysis for non-convex functions [38]. Related work on distributed Nesterov-type methods can be found in [39]- [41], which is restricted to undirected graphs.…”

Section: A Centralized Optimization: Nesterov's Methodsmentioning

confidence: 99%

“…Since these variants only require CS weights, AB and ABN are preferable due to their faster convergence. It is further straightforward to conceive a timevarying implementation of ABN and FROZEN over gossip based protocols or random graphs, see e.g., the related work in [36], [37] on non-accelerated methods. Asynchronous schemes may also be derived following the methodologies studied in [42], [43].…”

Section: Algorithm 2 Frozenmentioning

confidence: 99%

Distributed Nesterov Gradient Methods Over Arbitrary Graphs

Ran

Jakovetić

Khan

2019

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

In this letter, we introduce a distributed Nesterov method, termed as ABN , that does not require doubly-stochastic weight matrices. Instead, the implementation is based on a simultaneous application of both row-and column-stochastic weights that makes this method applicable to arbitrary (stronglyconnected) graphs. Since constructing column-stochastic weights needs additional information (the number of outgoing neighbors at each agent), not available in certain communication protocols, we derive a variation, termed as FROZEN, that only requires row-stochastic weights but at the expense of additional iterations for eigenvector learning. We numerically study these algorithms for various objective functions and network parameters and show that the proposed distributed Nesterov methods achieve acceleration compared to the current state-of-the-art methods for distributed optimization.

show abstract

“…In [8] und [12] werden verteilte Gradientenverfahren für unbeschränkte Optimierungen betrachtet. [8] verwendet das Push-Sum-Konsensus-Protokoll zur Berechnung von verteilten Informationen und beweist, dass eine Konvergenz auf ein Optimum auch bei zeitlichen Änderungen der Kommunikationsverbindungen möglich ist.…”

Section: Introductionunclassified

“…[8] verwendet das Push-Sum-Konsensus-Protokoll zur Berechnung von verteilten Informationen und beweist, dass eine Konvergenz auf ein Optimum auch bei zeitlichen Änderungen der Kommunikationsverbindungen möglich ist. [12] untersucht ein Push-Pull-Konsensus-Protokoll zur Lösung eines verteilten Optimierungsproblems und kann eine lineare Konvergenz des dort vorgestellten Algorithmus nachweisen. [5] untersucht ebenfalls unbeschränkte Gradi-entenverfahren, die auf einer verteilten Ausführung des Nesterov-Gradienten-Verfahrens beruhen und dadurch eine schnelle Konvergenzrate aufweisen.…”

Section: Introductionunclassified

Optimales Energie-Management über verteilte, beschränkte Gradientenverfahren

Zimmermann

Tatarenko

Willert

et al. 2019

At - Automatisierungstechnik

View full text Add to dashboard Cite

Zusammenfassung Dieser Beitrag beschäftigt sich mit verteilten, beschränkten Gradientenverfahren zur Optimierung eines Energie-Management-Problems. Zwei verschiedene Lösungsstrategien werden betrachtet. Zum einen wird ein Entkopplungsansatz analysiert, bei dem über einen Lagrange-Multiplikatoransatz die Beschränkungen in die Zielfunktion aufgenommen werden. Durch ein Gegenbeispiel wird gezeigt, dass dieses Verfahren nicht in jedem Fall auf das globale Optimum des Energie-Management-Problems konvergieren kann. Die zweite Strategie berücksichtigt Nebenbedingungen über einen Straffunktionsansatz und löst das Problem durch die Push-Sum-Konsensus-Dynamik. In der anschließenden Analyse dieses Verfahrens durch Simulation wird auf die Problematik der optimalen Parameterwahl sowie auf das Konvergenzverhalten bei unterschiedlicher Knoten- und Kantenanzahl des Graphen eingegangen.

show abstract

Push–Pull Gradient Methods for Distributed Optimization in Networks

Cited by 245 publications

References 47 publications

Distributed stochastic gradient tracking methods

Distributed stochastic gradient tracking methods

Distributed Nesterov Gradient Methods Over Arbitrary Graphs

Optimales Energie-Management über verteilte, beschränkte Gradientenverfahren

Contact Info

Product

Resources

About