2019
DOI: 10.48550/arxiv.1906.07865
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study

Cam Linke,
Nadia M. Ady,
Martha White
et al.

Abstract: Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience-how to adapt the system's behaviour-to optimize the learning of a collection of value functions. A simple answer is to compute an intrinsic reward based on the statistics o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 40 publications
(64 reference statements)
0
5
0
Order By: Relevance
“…Surprisal [Achiam and Sastry, 2017] and model disagreement [Pathak et al, 2019] present computationally-tractable alternatives to information gain, at the cost of the accuracy of the estimation. For comprehensive reviews of intrinsic motivation signal choices, see [Aubret et al, 2019, Linke et al, 2019. In this work, we present a novel method for estimating learning progress that is "consistent" with the original prediction gain objective while also scaling to high-dimensional continuous action-spaces.…”
Section: Artificial Intelligence Literaturementioning
confidence: 99%
See 2 more Smart Citations
“…Surprisal [Achiam and Sastry, 2017] and model disagreement [Pathak et al, 2019] present computationally-tractable alternatives to information gain, at the cost of the accuracy of the estimation. For comprehensive reviews of intrinsic motivation signal choices, see [Aubret et al, 2019, Linke et al, 2019. In this work, we present a novel method for estimating learning progress that is "consistent" with the original prediction gain objective while also scaling to high-dimensional continuous action-spaces.…”
Section: Artificial Intelligence Literaturementioning
confidence: 99%
“…Information Gain [Houthooft et al, 2016, Linke et al, 2019 based methods seek to minimize uncertainty in the Bayesian posterior distribution over model parameters:…”
Section: Curiosity Signalsmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, exploration can be induced by adding noise to ANN parameters 1398,1399 . Other approaches to exploration include rewarding actors for increasing action entropy [1399][1400][1401] and intrinsic motivation [1402][1403][1404] , where ANNs are incentified to explore actions that they are unsure about.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…On the other hand, many intrinsic rewards have been proposed to encourage exploration, inspired by animal behaviours. Examples include prediction error (Schmidhuber, 1991a;b;Oudeyer et al, 2007;Gordon & Ahissar, 2011;Mirolli & Baldassarre, 2013;Pathak et al, 2017), surprise (Itti & Baldi, 2006), weight change (Linke et al, 2019), and state-visitation counts (Sutton, 1990;Poupart et al, 2006;Strehl & Littman, 2008;Bellemare et al, 2016;Ostrovski et al, 2017). Although these kinds of intrinsic rewards are not domain-specific, they are often not well-aligned with the task that the agent tries to solve, and ignores the effect on the agent's learning dynamics.…”
Section: Related Workmentioning
confidence: 99%