Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553422
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic analysis of multiagent Q -learning with ε-greedy exploration

Abstract: The development of mechanisms to understand and model the expected behaviour of multiagent learners is becoming increasingly important as the area rapidly find application in a variety of domains. In this paper we present a framework to model the behaviour of Q-learning agents using the ǫ-greedy exploration mechanism. For this, we analyse a continuous-time version of the Q-learning update rule and study how the presence of other agents and the ǫ-greedy mechanism affect it. We then model the problem as a system… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(17 citation statements)
references
References 16 publications
1
16
0
Order By: Relevance
“…Other authors [Babes et al, 2009;Gomes and Kowalczyk, 2009; independently arrived at the same expected change of Q-values. However, these sources explicitly consider ǫ-greedy as the policy generation function, which maps Q-values to a few discrete policies and thus does not allow the policy space of the process to be described in a self-consistent way.…”
Section: Discrepancy Between Q-learning and Its Idealized Modelsupporting
confidence: 56%
“…Other authors [Babes et al, 2009;Gomes and Kowalczyk, 2009; independently arrived at the same expected change of Q-values. However, these sources explicitly consider ǫ-greedy as the policy generation function, which maps Q-values to a few discrete policies and thus does not allow the policy space of the process to be described in a self-consistent way.…”
Section: Discrepancy Between Q-learning and Its Idealized Modelsupporting
confidence: 56%
“…The experimental parameters are as follows: α = 0.05, γ = 0.9, and ε = 0.85. In each episode from line 11 to line 31, the model chooses an action a ( i , j ) given current state s based on ε -greedy [ 21 ] and receives a new state s' (lines 12∼13). There are two conditions in which the episode will end.…”
Section: Methodology and Learning Designmentioning
confidence: 99%
“…An important aspect of understanding the behavior of a multiagent learning algorithm is theoretically modeling and analyzing its underlying dynamics [6,20,25].…”
Section: Theoretical Modeling and Analysis Of Sa-igamentioning
confidence: 99%