Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck, Katja; Nowé, Ann; Parent, Johan; Tuyls, Karl

doi:10.1007/s10458-006-9007-0

Cited by 37 publications

(34 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the past few years, a significant part of the research has focused on comprehending and solving single-stage multi-agent problems, modeled as a normal form game (Verbeeck, Nowé, Parent, & Tuyls, 2007) and multi-stage games modeled as Markov games (Verbeeck, Nowé, Peeters, & Tuyls, 2005). Wheeler et al have shown that a set of decentralized learning automata is able to control a finite Markov Chain with unknown transition probabilities and rewards (Wheeler & Narendra, 1986).…”

Section: Related Workmentioning

confidence: 99%

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Masoumi

Meybodi

2011

Expert Systems with Applications

View full text Add to dashboard Cite

a b s t r a c tLearning automata (LA) were recently shown to be valuable tools for designing Multi-Agent Reinforcement Learning algorithms and are able to control the stochastic games. In this paper, the concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi-agent systems and speeding up the learning process. The multi-agent system considered in this paper is designed to find optimal policies in Markov games. We consider several dummy agents that walk around in the states of the environment, make local learning automaton active, and bring information so that the involved learning automaton can update their local state. The entropy of the probability vector for the learning automata of the next state is used to determine reward or penalty for the actions of learning automata. The experimental results have shown that in terms of the speed of reaching the optimal policy, the proposed algorithm has better learning performance than other learning algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Masoumi

Meybodi

2011

Expert Systems with Applications

View full text Add to dashboard Cite

show abstract

“…Unfortunately, these algorithms involve searching the whole strategy space, so their convergence time is exponential. Another algorithm that uses stages to provide a stable learning environment is the ESRL algorithm for coordinated exploration (Verbeeck, Nowé, Parent, & Tuyls, 2007). Marden, Arslan, and Shamma (2007b) and Marden, Young, Arslan, and Shamma (2009) use an algorithm with experimentation and best replies but without explicit stages that converges for weakly acyclic games, where best-reply dynamics converge when agents move one at a time, rather than moving all at once, as we assume here.…”

Section: Related Workmentioning

confidence: 99%

Multiagent Learning in Large Anonymous Games

Kash¹,

Friedman²,

Halpern³

2011

jair

View full text Add to dashboard Cite

In large systems, it is important for agents to learn to act effectively, but sophisticated multi-agent learning algorithms generally do not scale. An alternative approach is to find restricted classes of games where simple, efficient algorithms converge. It is shown that stage learning efficiently converges to Nash equilibria in large anonymous games if bestreply dynamics converge. Two features are identified that improve convergence. First, rather than making learning more difficult, more agents are actually beneficial in many settings. Second, providing agents with statistical information about the behavior of others can significantly reduce the number of observations needed.

show abstract

“…In previous literature [17,18], distributed algorithms for discovering such sequences found suboptimal ones. We will show optimal solutions, albeit with non-distributed algorithms.…”

Section: Infinite-length Gamesmentioning

confidence: 99%

Long-term fairness with bounded worst-case losses

Balan

Richards

Luke

2009

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

How does one repeatedly choose actions so as to be fairest to the multiple beneficiaries of those actions? We examine approaches to discovering sequences of actions for which the worst-off beneficiaries are treated maximally well, then secondarily the second-worst-off, and so on. We formulate the problem for the situation where the sequence of action choices continues forever; this problem may be reduced to a set of linear programs. We then extend the problem to situations where the game ends at some unknown finite time in the future. We demonstrate that an optimal solution is NP-hard, and present two good approximation algorithms.

show abstract

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Cited by 37 publications

References 16 publications

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Multiagent Learning in Large Anonymous Games

Long-term fairness with bounded worst-case losses

Contact Info

Product

Resources

About