2004
DOI: 10.1080/09528130412331297956
|View full text |Cite
|
Sign up to set email alerts
|

On-policy concurrent reinforcement learning

Abstract: When an agent learns in a multi-agent environment, the payoff it receives is dependent on the behaviour of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multi-agent systems more difficult than single-agent learning. Prior attempts at value-function based learning in such domains have used off-policy Q-learning that do not scale well as the cornerstone, with restricted success. This paper studies on-policy modifications of such al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…According to Junling Hu et al the algorithmic complexity of finding an equilibrium in matrix games is unknown. Bikramjit et al prove that the minmax-Q and Nash-Q are equivalent in the purely competitive domains [13]. However, the advantage of the minmax-Q algorithm that it is more resource-efficient.…”
Section: Learning Honeypots and Attackersmentioning
confidence: 98%
See 3 more Smart Citations
“…According to Junling Hu et al the algorithmic complexity of finding an equilibrium in matrix games is unknown. Bikramjit et al prove that the minmax-Q and Nash-Q are equivalent in the purely competitive domains [13]. However, the advantage of the minmax-Q algorithm that it is more resource-efficient.…”
Section: Learning Honeypots and Attackersmentioning
confidence: 98%
“…However, a problem of such an approach is that attackers are considered as part of the environment despite their competitive nature. According to Banerjee et al [13] considering competitors in a learning scenario as aspects of the environment may mean the environment is no longer stationary and convergence results may be impacted.…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…This makes the focus on multiagent learning research extremely timely and justified. Several algorithms for multiagent learning have been proposed [2,5,7,8], mostly guaranteed to converge to an equilibrium in the limit. It is noted in [3] that none of these methods simultaneously satisfies rationality and convergence, two of the desirable criteria for any multiagent learning algorithm.…”
Section: Introductionmentioning
confidence: 99%