2013
DOI: 10.1007/s10458-013-9222-4
|View full text |Cite
|
Sign up to set email alerts
|

Multiagent learning in the presence of memory-bounded agents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
25
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(26 citation statements)
references
References 21 publications
1
25
0
Order By: Relevance
“…Meanwhile, individually rational behavior is employed when playing against a selfish agent. Similar idea of adaptively behaving differently against different opponents was also employed in previous algorithms [10,12,17,19]. However, all the existing works focus on maximizing an agent's individual payoff against different opponents in different types of games, but do not directly take into consideration the goal of maximizing social welfare (e.g., cooperate in the prisoner's dilemma game).…”
Section: Related Workmentioning
confidence: 99%
“…Meanwhile, individually rational behavior is employed when playing against a selfish agent. Similar idea of adaptively behaving differently against different opponents was also employed in previous algorithms [10,12,17,19]. However, all the existing works focus on maximizing an agent's individual payoff against different opponents in different types of games, but do not directly take into consideration the goal of maximizing social welfare (e.g., cooperate in the prisoner's dilemma game).…”
Section: Related Workmentioning
confidence: 99%
“…These types of players can be thought of as a finite automata that take the D most recent actions of the opponent and use this history to compute their policy [41]. Since memory bounded opponents are a special case of opponents, different algorithms were specially developed to be used against these agents [11,13]. For example, the agent LoE-AIM [12] is designed to play against a memory bounded player (but does not know the exact memory length).…”
Section: Model Based Approachesmentioning
confidence: 99%
“…R-max# starts by initializing the counters n(s, a) = n(s, a, a ) = r (s, a) = 0, rewards to rmax and transitions to a fictitious state s 0 (like R-max) and set of known pairs K = ∅ (lines 1-4). Then, for each round the algorithm checks for each state-action pair (s, a) that is labeled as known (∈ K) how many rounds have passed since the last update (lines [8][9], if this number is greater than the threshold τ then the reward for that pair is set to rmax, the counters n(s, a), n(s, a, s ) and the transition function T (s, a, s ) are reset and a new policy is computed (lines [10][11][12][13][14]. Then, the algorithm behaves as R-max (lines [15][16][17][18][19][20][21][22][23][24].…”
Section: R-max#mentioning
confidence: 99%
“…These include learning from fictitious play [33], memory bounded learning [34] to compensate for non-stationary strategies by the opposition and cultural learning through reinforcement and replicator dynamics [35]. The models in this paper augment this latter approach to develop motivated learning agents.…”
Section: Assumptions and Related Workmentioning
confidence: 99%