2018 IEEE Conference on Decision and Control (CDC) 2018
DOI: 10.1109/cdc.2018.8619839
|View full text |Cite
|
Sign up to set email alerts
|

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

Abstract: The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement learning (RL) algorithm which learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through a local processing. Information exchange over sparse communication… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(40 citation statements)
references
References 24 publications
0
40
0
Order By: Relevance
“…The asymptotic ones mainly concern almost sure (a.s.) convergence Tsitsiklis et al [1986], Bianchi et al [2013], Morral et al [2014], Mathkar and Borkar [2016], Kar et al [2013], Zhang et al [2018b,a], Suttle et al [2020], Lee et al [2018]. The first four papers here provide convergence guarantees for a broad family of nonlinear DSA algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…The asymptotic ones mainly concern almost sure (a.s.) convergence Tsitsiklis et al [1986], Bianchi et al [2013], Morral et al [2014], Mathkar and Borkar [2016], Kar et al [2013], Zhang et al [2018b,a], Suttle et al [2020], Lee et al [2018]. The first four papers here provide convergence guarantees for a broad family of nonlinear DSA algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…Chapter 2.1 in [22]. First, letx(n) denote a continuous piecewise linear function that passes through the discretetime updates in (16), so thatx(n(t)) =θ(t) for t ≥ 0 and x(n) =x(n(t)) +x (n(t+1))−x(n(t)) n(t+1)−n(t) (n − n(t)) for n(t) < n < n(t + 1), where n(0) = 0, n(t) = t−1 m=0 α θ (m) and n denotes the continuous time index. Moreover, define the function x s (n) that is the unique solution of the dynamical equation (14) for n ≥ s with initial condition x s (s) =θ(s), and the function x s (n) that is the unique solution of (14) for n ≤ s with the ending condition x s (s) =θ(s).…”
Section:     Andmentioning
confidence: 99%
“…Proof. Using Lemma IV.9, we need to show thatθ(t) given by (16) converges to the set Λ. Moreover, using Lemma IV.10, we need to show that the dynamics (14) converge to the set Λ.…”
Section:     Andmentioning
confidence: 99%
“…Macua et al [19] applied diffusion strategies to develop a fully distributed gradient temporal-difference (GTD) algorithm, then provided a mean-square-error performance analysis and established the convergence under constant step size updates. Besides, Lee et al [20] studied a new class of distributed GTD algorithm based on primal-dual iterations, and proved that it almost surely converged to a set of stationary points using ODE-based methods. In addition, Wai et al [21] proposed a decentralized primaldual optimization algorithm with a double averaging update scheme to solve the policy evaluation problem in MARL, and established the global geometric rate of convergence.…”
Section: Introductionmentioning
confidence: 99%