2017
DOI: 10.48550/arxiv.1709.04326
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning with Opponent-Learning Awareness

Abstract: Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multiagent reinforcement learning, but also can be extended to hierarchical reinforcement learning, generative adversarial networks and decentralised optimization. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LO… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
33
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(33 citation statements)
references
References 19 publications
0
33
0
Order By: Relevance
“…Ref. [625] shows that naive and commonly defecting reinforcement learners start to cooperate when they incorporate in their own learning process the awareness of their opponent's learning. Appropriately dubbed learning with opponent-learning awareness or LOLA, the approach leads to the emergence of tit-for-tat and consequent cooperation in the iterated prisoners' dilemma.…”
Section: Ai Agents For Promoting Cooperationmentioning
confidence: 99%
See 1 more Smart Citation

Social physics

Jusup,
Holme,
Kanazawa
et al. 2021
Preprint
“…Ref. [625] shows that naive and commonly defecting reinforcement learners start to cooperate when they incorporate in their own learning process the awareness of their opponent's learning. Appropriately dubbed learning with opponent-learning awareness or LOLA, the approach leads to the emergence of tit-for-tat and consequent cooperation in the iterated prisoners' dilemma.…”
Section: Ai Agents For Promoting Cooperationmentioning
confidence: 99%
“…One of the best-performing strategies in terms of the overall average score is the Desired Belief Strategy [632], which actively analyses the opponent and responds depending on whether the opponent's action is perceived as noise or a genuine behavioural change. Ultimately, an inescapable conclusions is that reinforcement learning is an effective means to construct strong strategies for various iterated social-dilemma situations [625,631,633,634].…”
Section: Ai Agents For Promoting Cooperationmentioning
confidence: 99%

Social physics

Jusup,
Holme,
Kanazawa
et al. 2021
Preprint
“…For example, Lockhart et al [2019] performs direct policy optimization against worst-case opponents and effectively finds an NE in Kuhn Poker and Goofspiel card game. Foerster et al [2017] invented LOLA where each agent shapes learning of other agents. It gave the highest average returns on the iterated prisoners' dilemma (IPD).…”
Section: Introductionmentioning
confidence: 99%
“…In other words, we perform our experiments in scenarios with a similar nature to the one depicted in Figure 1 that essentially require all agents to work together and success cannot be achieved by any of them individually. Our main contributions are as follows: [8]. Additionally, the idea of decorrelating training samples by drawing them from an experience replay buffer becomes obsolete and a multi-agent derivation of importance sampling can be employed to remove the outdated samples from the replay buffer [9].…”
Section: Introductionmentioning
confidence: 99%