2021
DOI: 10.48550/arxiv.2109.01795
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

Abstract: Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a finite-state discounted Stochastic Game within the exponential precision is PPADcomplete. We adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

5
2

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 33 publications
0
7
0
Order By: Relevance
“…It has been typically applied to selfdriving (Shalev-Shwartz et al, 2016;, order dispatching , modeling population dynamics , and gaming AIs (Peng et al, 2017;Zhou et al, 2021). However, the scheme of learning policy from experience requires the algorithms with high computational complexity (Deng et al, 2021) and sample efficiency due to the limited computing resources and high cost resulting from the data collection (Haarnoja et al, 2018;Munos et al, 2016;Espeholt et al, 2019;. Furthermore, even in domains where the online environment is feasible, we might still prefer to utilise previously-collected data instead; for example, if the domain's complex requires large datasets for effective generalisation.…”
Section: Inrtoductionmentioning
confidence: 99%
“…It has been typically applied to selfdriving (Shalev-Shwartz et al, 2016;, order dispatching , modeling population dynamics , and gaming AIs (Peng et al, 2017;Zhou et al, 2021). However, the scheme of learning policy from experience requires the algorithms with high computational complexity (Deng et al, 2021) and sample efficiency due to the limited computing resources and high cost resulting from the data collection (Haarnoja et al, 2018;Munos et al, 2016;Espeholt et al, 2019;. Furthermore, even in domains where the online environment is feasible, we might still prefer to utilise previously-collected data instead; for example, if the domain's complex requires large datasets for effective generalisation.…”
Section: Inrtoductionmentioning
confidence: 99%
“…A well-known solution concept that describes the equilibrium of NF games is Nash Equilibrium (NE) [11,30]. Let p i (𝑠) where…”
Section: Preliminaries and Notationsmentioning
confidence: 99%
“…Stochastic games (SGs) (Shapley 1953;Deng et al 2021) offer a multi-player game framework where agents jointly decide the loss and the state transition. Compared to OMDPs, the main difference is that SGs allow each player to have representation of states, actions and rewards, thus players can learn the representations over time and find the NE of the stochastic games (Wei, Hong, and Lu 2017;Tian et al 2020).…”
Section: Related Workmentioning
confidence: 99%