2020 59th IEEE Conference on Decision and Control (CDC) 2020
DOI: 10.1109/cdc42340.2020.9303966
|View full text |Cite
|
Sign up to set email alerts
|

Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…Distributed TD-learning algorithms are studied in [8]- [14]. The results in [8], [9], [34] consider central rewards with different assumptions.…”
Section: B: Policy Evaluation Problemmentioning
confidence: 99%
See 2 more Smart Citations
“…Distributed TD-learning algorithms are studied in [8]- [14]. The results in [8], [9], [34] consider central rewards with different assumptions.…”
Section: B: Policy Evaluation Problemmentioning
confidence: 99%
“…The work in [13] develops the so-called homotopy stochastic primaldual algorithm with local actions and adaptive learning rates. A distributed gradient TD-learning with value functions that lie in reproducing kernel Hilbert spaces is proposed in [14] with its finite-sample analysis.…”
Section: B: Policy Evaluation Problemmentioning
confidence: 99%
See 1 more Smart Citation
“…Similar bounds have also been derived in Chen et al [2021] for two distributed variants of the TDC method. There are also some other works that derive finite-time bounds Wai et al [2018], Ding et al [2019], Xu et al [2020], Zhao et al [2020], Heredia andMou [2020], Stanković et al [2020], Ren et al [2021], Zhang et al [2021b], but we do not discuss them in this paper since the algorithms proposed there do not fit the update rule given in (2).…”
Section: Related Workmentioning
confidence: 99%
“…In the aforementioned references (limited to discounted objectives), convergence guarantees are mostly asymptotic, apply only to MARL sub-problems as policy evaluation (estimating the value function assuming a fixed policy (Heredia & Mou, 2020;Sha et al, 2020)), or due to implied non-convexity induced by policy parameterization, cannot avoid spurious 1 policies (Qu et al, 2020a,b) -see (Zhang et al, 2020) for further details.…”
Section: Introductionmentioning
confidence: 99%