2022
DOI: 10.1109/access.2022.3211395
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

Abstract: The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement learning (RL) algorithm for evaluating a given policy based on reward feedbacks. In multi-agent settings, multiple RL agents concurrently behave, and each agent receives its local rewards. The goal of each agent is to evaluate a given polic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 42 publications
1
2
0
Order By: Relevance
“…The other assumptions are also met. Therefore, the PDGD of Problem 8, given in (12), is globally asymptotically stable, and converges to its unique equilibrium point (θ * , λ * ), where λ * is defined in (11). This completes the proof.…”
Section: Gtd3supporting
confidence: 51%
See 2 more Smart Citations
“…The other assumptions are also met. Therefore, the PDGD of Problem 8, given in (12), is globally asymptotically stable, and converges to its unique equilibrium point (θ * , λ * ), where λ * is defined in (11). This completes the proof.…”
Section: Gtd3supporting
confidence: 51%
“…In particular, GTD2, proposed in [4], can be interpreted as a stochastic primal-dual gradient dynamics (PDGD) of a convex-concave saddlepoint problem, and hence, its convergence analysis can be approached from a different angle using optimization theory [5], [6]. These interpretations were subsequently applied to distributed RL problems in [8], [9], [10], and [11].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation