2020
DOI: 10.1609/aaai.v34i04.6040
|View full text |Cite
|
Sign up to set email alerts
|

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

Abstract: Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 16 publications
(39 reference statements)
1
5
0
Order By: Relevance
“…Therefore, since distractors can be designed to produce endless new observations and to embody unlearnable-to-perfection dynamics, they can easily attract the complete attention of such algorithms and make them ignore the actual task in full. For example, Kim et al (2019) and Song et al (2020) added pixel-level white noise to each image observation received by an agent playing Montezuma's revenge (see Figure 11(a)). In addition, the dynamics of this noise was kept independent of the agent's actions (i.e., it could not turn it off or tamper with it).…”
Section: Distractorsmentioning
confidence: 99%
“…Therefore, since distractors can be designed to produce endless new observations and to embody unlearnable-to-perfection dynamics, they can easily attract the complete attention of such algorithms and make them ignore the actual task in full. For example, Kim et al (2019) and Song et al (2020) added pixel-level white noise to each image observation received by an agent playing Montezuma's revenge (see Figure 11(a)). In addition, the dynamics of this noise was kept independent of the agent's actions (i.e., it could not turn it off or tamper with it).…”
Section: Distractorsmentioning
confidence: 99%
“…Causal influence is also related to the concept of contingency awareness from psychology [38], that is, the knowledge that one's actions can affect the environment. On Atari games, exploring through the lens of contingency awareness has led to state-of-the-art results [39,40].…”
Section: Related Workmentioning
confidence: 99%
“…Our work is based on the agent-surrounding separation concept and drives an efficient state intrinsic control objective, which empowers RL agents to learn meaningful interaction and control skills without any task reward. A recent work (Song et al, 2020) with similar motivation, introduces mega-reward, which aims to maximize the control capabilities of agents on given entities in a given environment and show promising results in Atari games. Another related work (Dilokthanakul et al, 2019) proposes feature control as intrinsic motivation and shows state-of-the-art results in Montezuma's revenge.…”
Section: Related Workmentioning
confidence: 99%