2021
DOI: 10.48550/arxiv.2106.00198
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient play in stochastic games: stationary points, convergence, and sample complexity

Abstract: We study the performance of the gradient play algorithm for multi-agent tabular Markov decision processes (MDPs), which are also known as stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first order stationary policies are equivale… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
20
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(21 citation statements)
references
References 40 publications
1
20
0
Order By: Relevance
“…• We propose an independent policy gradient algorithm -Algorithm 1 -for learning an -Nash equilibrium of MPGs with O(1/ 2 ) iteration complexity. In contrast to the existing results (Leonardos et al, 2021;Zhang et al, 2021b), such iteration complexity does not explicitly depend on the state space size.…”
Section: Introductionmentioning
confidence: 60%
See 3 more Smart Citations
“…• We propose an independent policy gradient algorithm -Algorithm 1 -for learning an -Nash equilibrium of MPGs with O(1/ 2 ) iteration complexity. In contrast to the existing results (Leonardos et al, 2021;Zhang et al, 2021b), such iteration complexity does not explicitly depend on the state space size.…”
Section: Introductionmentioning
confidence: 60%
“…In this paper, we provide the first affirmative answer to this question for a class of mixed cooperative/competitive Markov games -the so-called Markov potential games (MPGs) (Macua et al, 2018;Leonardos et al, 2021;Zhang et al, 2021b). In particular, we make the following contributions:…”
Section: Introductionmentioning
confidence: 94%
See 2 more Smart Citations
“…An application of RL for finding NE in linear-quadratic mean-field games has been studied in [26]. Moreover [27], [28] show that n-player Markov potential games, an extension of static potential games to dynamic stochastic games, admit polynomial-time algorithms for computing their NE policies. Unfortunately, the class of Markov potential games is very restrictive because it requires strong assumptions on the existence of a general potential function.…”
mentioning
confidence: 99%