Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

Lee, Donghwan; Yoon, Hyung‐Jin; Hovakimyan, Naira

doi:10.1109/cdc.2018.8619839

Cited by 36 publications

(40 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The asymptotic ones mainly concern almost sure (a.s.) convergence Tsitsiklis et al [1986], Bianchi et al [2013], Morral et al [2014], Mathkar and Borkar [2016], Kar et al [2013], Zhang et al [2018b,a], Suttle et al [2020], Lee et al [2018]. The first four papers here provide convergence guarantees for a broad family of nonlinear DSA algorithms.…”

Section: Related Workmentioning

confidence: 99%

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Thoppe¹,

Kumar²

2021

Preprint

View full text Add to dashboard Cite

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize n −γ with γ ∈ (0, 1), the distributed TD(0) algorithm with linear function approximation has a convergence rate of O( √ n −γ ln n) a.s.; for the 1/n type stepsize, the same is O( √ n −1 ln ln n) a.s. These decay rates do not depend on the graph depicting the interactions among the different agents.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Thoppe¹,

Kumar²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Chapter 2.1 in [22]. First, letx(n) denote a continuous piecewise linear function that passes through the discretetime updates in (16), so thatx(n(t)) =θ(t) for t ≥ 0 and x(n) =x(n(t)) +x (n(t+1))−x(n(t)) n(t+1)−n(t) (n − n(t)) for n(t) < n < n(t + 1), where n(0) = 0, n(t) = t−1 m=0 α θ (m) and n denotes the continuous time index. Moreover, define the function x s (n) that is the unique solution of the dynamical equation (14) for n ≥ s with initial condition x s (s) =θ(s), and the function x s (n) that is the unique solution of (14) for n ≤ s with the ending condition x s (s) =θ(s).…”

Section:     Andmentioning

confidence: 99%

“…Proof. Using Lemma IV.9, we need to show thatθ(t) given by (16) converges to the set Λ. Moreover, using Lemma IV.10, we need to show that the dynamics (14) converge to the set Λ.…”

Section:     Andmentioning

confidence: 99%

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

Zhang

Zavlanos

2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

show abstract

“…Macua et al [19] applied diffusion strategies to develop a fully distributed gradient temporal-difference (GTD) algorithm, then provided a mean-square-error performance analysis and established the convergence under constant step size updates. Besides, Lee et al [20] studied a new class of distributed GTD algorithm based on primal-dual iterations, and proved that it almost surely converged to a set of stationary points using ODE-based methods. In addition, Wai et al [21] proposed a decentralized primaldual optimization algorithm with a double averaging update scheme to solve the policy evaluation problem in MARL, and established the global geometric rate of convergence.…”

Section: Introductionmentioning

confidence: 99%

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Zhao

Lei

2021

Preprint

View full text Add to dashboard Cite

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the nonstationary problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the sample and communication complexity to obtain an -approximate stationary point. Numerical experiments on the control problem in MARL are performed to validate the effectiveness of the proposed algorithm.

show abstract

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

Cited by 36 publications

References 24 publications

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Contact Info

Product

Resources

About