Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems - AAMAS '03 2003
DOI: 10.1145/860685.860686
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive policy gradient in multiagent learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
1

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(24 citation statements)
references
References 0 publications
0
23
1
Order By: Relevance
“…The model of repeated games has been adopted by different researchers to represent complex multiagent interaction schemas. Computer scientists, however, mainly focused on the stationary solutions of repeated games, by only using its repetitive property as a mean to implement an iterative algorithm searching for a stationary equilibrium solution (Bowling & Veloso, 2002;Banerjee & Peng, 2003;Conitzer & Sandholm, 2007). While stationary equilibria are the appropriate solutions in the repeated games, their corresponding set of players' payoffs is typically very restricted.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The model of repeated games has been adopted by different researchers to represent complex multiagent interaction schemas. Computer scientists, however, mainly focused on the stationary solutions of repeated games, by only using its repetitive property as a mean to implement an iterative algorithm searching for a stationary equilibrium solution (Bowling & Veloso, 2002;Banerjee & Peng, 2003;Conitzer & Sandholm, 2007). While stationary equilibria are the appropriate solutions in the repeated games, their corresponding set of players' payoffs is typically very restricted.…”
Section: Resultsmentioning
confidence: 99%
“…As a matter of fact, the fundamental repetitive property of repeated games has been typically reduced to a way to give to an adaptive (or learning) algorithm a sufficient time to converge (often, jointly with another learning agent) to a fixed behavior (Bowling & Veloso, 2002;Banerjee & Peng, 2003;Conitzer & Sandholm, 2007;Burkov & Chaib-draa, 2009). Thus, the repetitive nature of the game has only been considered as a permissive property, that is, a property that permits implementing an algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…In an effort to overcome the drawback of declining solution quality with large problem instances, we have integrated the variable step-size hill-climbing algorithm PDWoLF [9] into several legacy ACO algorithms. The following reviews off policy-based learning, variable-step hill-climbing, and PDWoLF.…”
Section: Policy Dynamics Win or Learn Fast (Pdwolf)mentioning
confidence: 99%
“…In the Policy Dynamics Win or Learn Fast Policy HillClimbing (PDWoLF-PHC) algorithm [9] Banerjee and Peng replaced the notion of average policy contained in equation (5) with one that uses the gradient of the policy. Otherwise, WoLF-PHC and PDWoLF-PHC are identical.…”
Section: Policy Dynamics Win or Learn Fast (Pdwolf)mentioning
confidence: 99%
See 1 more Smart Citation