Multiagent learning in the presence of memory-bounded agents

Chakraborty, Debayan; Stone, Peter

doi:10.1007/s10458-013-9222-4

Cited by 31 publications

(26 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, individually rational behavior is employed when playing against a selfish agent. Similar idea of adaptively behaving differently against different opponents was also employed in previous algorithms [10,12,17,19]. However, all the existing works focus on maximizing an agent's individual payoff against different opponents in different types of games, but do not directly take into consideration the goal of maximizing social welfare (e.g., cooperate in the prisoner's dilemma game).…”

Section: Related Workmentioning

confidence: 99%

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Zhang

Hao

et al. 2019

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.

show abstract

Section: Related Workmentioning

confidence: 99%

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Zhang

Hao

et al. 2019

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

show abstract

“…These types of players can be thought of as a finite automata that take the D most recent actions of the opponent and use this history to compute their policy [41]. Since memory bounded opponents are a special case of opponents, different algorithms were specially developed to be used against these agents [11,13]. For example, the agent LoE-AIM [12] is designed to play against a memory bounded player (but does not know the exact memory length).…”

Section: Model Based Approachesmentioning

confidence: 99%

“…R-max# starts by initializing the counters n(s, a) = n(s, a, a ) = r (s, a) = 0, rewards to rmax and transitions to a fictitious state s 0 (like R-max) and set of known pairs K = ∅ (lines 1-4). Then, for each round the algorithm checks for each state-action pair (s, a) that is labeled as known (∈ K) how many rounds have passed since the last update (lines [8][9], if this number is greater than the threshold τ then the reward for that pair is set to rmax, the counters n(s, a), n(s, a, s ) and the transition function T (s, a, s ) are reset and a new policy is computed (lines [10][11][12][13][14]. Then, the algorithm behaves as R-max (lines [15][16][17][18][19][20][21][22][23][24].…”

Section: R-max#mentioning

confidence: 99%

An exploration strategy for non-stationary opponents

Hernández-Leal

Zhan

Taylor

et al. 2016

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent's objective is to learn a model of the opponent's strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm Most of this work was performed while the first author was a graduate student at INAOE. This paper extends the paper "Exploration strategies to detect strategy switches" presented at the Adaptive Learning Agents workshop [27]. 123Auton Agent Multi-Agent Syst called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

show abstract

“…These include learning from fictitious play [33], memory bounded learning [34] to compensate for non-stationary strategies by the opposition and cultural learning through reinforcement and replicator dynamics [35]. The models in this paper augment this latter approach to develop motivated learning agents.…”

Section: Assumptions and Related Workmentioning

confidence: 99%

The Role of Implicit Motives in Strategic Decision-Making: Computational Models of Motivated Learning and the Evolution of Motivated Agents

Kasmarik

2015

Games

View full text Add to dashboard Cite

Individual behavioral differences in humans have been linked to measurable differences in their mental activities, including differences in their implicit motives. In humans, individual differences in the strength of motives such as power, achievement and affiliation have been shown to have a significant impact on behavior in social dilemma games and during other kinds of strategic interactions. This paper presents agent-based computational models of power-, achievement-and affiliation-motivated individuals engaged in game-play. The first model captures learning by motivated agents during strategic interactions. The second model captures the evolution of a society of motivated agents. It is demonstrated that misperception, when it is a result of motivation, causes agents with different motives to play a given game differently. When motivated agents who misperceive a game are present in a population, higher explicit payoff can result for the population as a whole. The implications of these results are discussed, both for modeling human behavior and for designing artificial agents with certain salient behavioral characteristics.

show abstract

Multiagent learning in the presence of memory-bounded agents

Cited by 31 publications

References 21 publications

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

An exploration strategy for non-stationary opponents

The Role of Implicit Motives in Strategic Decision-Making: Computational Models of Motivated Learning and the Evolution of Motivated Agents

Contact Info

Product

Resources

About