2021
DOI: 10.48550/arxiv.2112.07859
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Abstract: Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Qlearning algorithms in a significant class of general-sum stochastic games (SGs) -weakly acyclic SGs, which includes the common cooperative MARL setting with an identical reward to all agents (a Markov team pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 62 publications
0
3
0
Order By: Relevance
“…The dynamic presented can converge to a (stationary pure-strategy) equilibrium if the associated normal-form game is weakly acyclic with respect to best (or better) response dynamics. The finite-sample complexity of the algorithm is also established recently in [Gao et al, 2021]. In contrast, our learning dynamic can converge to a stationary mixed-strategy equilibrium, which is essential for a global convergence result across the MG spectrum, as a pure-strategy equilibrium does not exist in general, e.g., in zero-sum games.…”
Section: Independent Learning In Mgsmentioning
confidence: 89%
“…The dynamic presented can converge to a (stationary pure-strategy) equilibrium if the associated normal-form game is weakly acyclic with respect to best (or better) response dynamics. The finite-sample complexity of the algorithm is also established recently in [Gao et al, 2021]. In contrast, our learning dynamic can converge to a stationary mixed-strategy equilibrium, which is essential for a global convergence result across the MG spectrum, as a pure-strategy equilibrium does not exist in general, e.g., in zero-sum games.…”
Section: Independent Learning In Mgsmentioning
confidence: 89%
“…Such asymmetric policy gradient methods are not completely independent, as some implicit coordination is required to enable such a timescale separation across agents. This style of implicit coordination is also required for the finite-sample analysis of decentralized learning in certain general-sum stochastic games, e.g., Gao et al (2021), which improves the asymptotic convergence in Arslan and Yüksel (2017).…”
Section: Sample-efficient Marlmentioning
confidence: 99%
“…The asymptotic convergence of Q-learning with linear function approximation was established in [180] under a "negative drift" assumption. Under similar assumptions, the finite-sample analysis of Q-learning, as well as its on-policy variant SARSA, was performed in [49,185,183,186] for using linear function approximation, and in [187,118] for using neural network approximation. However, such negative drift assumption is highly artificial, highly restrictive, and is impossible to satisfy unless the discount factor of the MDP is extremely small (see Chapter 11).…”
Section: Related Literaturementioning
confidence: 99%