DOI: 10.22215/etd/2017-11788
|View full text |Cite
|
Sign up to set email alerts
|

On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games

Abstract: In this thesis, we investigate reinforcement learning algorithms on matrix, stochastic, and differential games. In matrix and stochastic games, the states and actions are represented in continuous domains. We propose two decentralized multi-agent reinforcement learning algorithms to solve the problem of learning in matrix and stochastic games when the learning agent has only minimum knowledge about the underlying game and the other learning agents. The proposed algorithms are the constant learning rate-based e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 91 publications
(258 reference statements)
0
1
0
Order By: Relevance
“…For example, in [6] and more recently [2], the authors focused on scenarios where players play in a coordinated manner a finite-horizon version of a zero-sum stochastic game within repeated episodes, referred to as episodic reinforcement learning, even though player payoffs are defined over an infinite horizon. In another line of work, in [32] and [1], the authors presented and studied algorithms that update policies only at certain time instances while keeping them fixed in between-even when players may have an incentive to change their actions-in order to create a stationary environment for learning the underlying model or estimating the associated Q-functions.…”
Section: Introductionmentioning
confidence: 99%
“…For example, in [6] and more recently [2], the authors focused on scenarios where players play in a coordinated manner a finite-horizon version of a zero-sum stochastic game within repeated episodes, referred to as episodic reinforcement learning, even though player payoffs are defined over an infinite horizon. In another line of work, in [32] and [1], the authors presented and studied algorithms that update policies only at certain time instances while keeping them fixed in between-even when players may have an incentive to change their actions-in order to create a stationary environment for learning the underlying model or estimating the associated Q-functions.…”
Section: Introductionmentioning
confidence: 99%