2006
DOI: 10.1007/s10458-006-9007-0
|View full text |Cite
|
Sign up to set email alerts
|

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Abstract: In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its privat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
33
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(34 citation statements)
references
References 16 publications
1
33
0
Order By: Relevance
“…In the past few years, a significant part of the research has focused on comprehending and solving single-stage multi-agent problems, modeled as a normal form game (Verbeeck, Nowé, Parent, & Tuyls, 2007) and multi-stage games modeled as Markov games (Verbeeck, Nowé, Peeters, & Tuyls, 2005). Wheeler et al have shown that a set of decentralized learning automata is able to control a finite Markov Chain with unknown transition probabilities and rewards (Wheeler & Narendra, 1986).…”
Section: Related Workmentioning
confidence: 99%
“…In the past few years, a significant part of the research has focused on comprehending and solving single-stage multi-agent problems, modeled as a normal form game (Verbeeck, Nowé, Parent, & Tuyls, 2007) and multi-stage games modeled as Markov games (Verbeeck, Nowé, Peeters, & Tuyls, 2005). Wheeler et al have shown that a set of decentralized learning automata is able to control a finite Markov Chain with unknown transition probabilities and rewards (Wheeler & Narendra, 1986).…”
Section: Related Workmentioning
confidence: 99%
“…Unfortunately, these algorithms involve searching the whole strategy space, so their convergence time is exponential. Another algorithm that uses stages to provide a stable learning environment is the ESRL algorithm for coordinated exploration (Verbeeck, Nowé, Parent, & Tuyls, 2007). Marden, Arslan, and Shamma (2007b) and Marden, Young, Arslan, and Shamma (2009) use an algorithm with experimentation and best replies but without explicit stages that converges for weakly acyclic games, where best-reply dynamics converge when agents move one at a time, rather than moving all at once, as we assume here.…”
Section: Related Workmentioning
confidence: 99%
“…In previous literature [17,18], distributed algorithms for discovering such sequences found suboptimal ones. We will show optimal solutions, albeit with non-distributed algorithms.…”
Section: Infinite-length Gamesmentioning
confidence: 99%