2023
DOI: 10.48550/arxiv.2303.03100
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Abstract: We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finitesample guarantees. In particular, we prove the first-known Õ(1/ǫ 2 ) sample complexity bound… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 60 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?