On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games

Awheda, Mostafa D.

doi:10.22215/etd/2017-11788

Cited by 1 publication

(1 citation statement)

References 91 publications

(258 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, in [6] and more recently [2], the authors focused on scenarios where players play in a coordinated manner a finite-horizon version of a zero-sum stochastic game within repeated episodes, referred to as episodic reinforcement learning, even though player payoffs are defined over an infinite horizon. In another line of work, in [32] and [1], the authors presented and studied algorithms that update policies only at certain time instances while keeping them fixed in between-even when players may have an incentive to change their actions-in order to create a stationary environment for learning the underlying model or estimating the associated Q-functions.…”

Section: Introductionmentioning

confidence: 99%

Fictitious Play in Zero-Sum Stochastic Games

Sayin¹,

Parise²,

Ozdaglar³

2022

SIAM J. Control Optim.

View full text Add to dashboard Cite

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zerosum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions.A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show that in both the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

show abstract

Section: Introductionmentioning

confidence: 99%