2017
DOI: 10.48550/arxiv.1703.01732
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

Abstract: Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as -greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
107
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(107 citation statements)
references
References 4 publications
0
107
0
Order By: Relevance
“…The utilisation of forward modelling error as reward signal has been implemented in deterministic and probabilistic settings (Achiam and Sastry, 2017;Shelhamer et al, 2016) and is commonly known as curiosity learning. The error reward signal encourages the exploration of unfamiliar parts of the state-action space which are not yet wellpredictable.…”
Section: Related Workmentioning
confidence: 99%
“…The utilisation of forward modelling error as reward signal has been implemented in deterministic and probabilistic settings (Achiam and Sastry, 2017;Shelhamer et al, 2016) and is commonly known as curiosity learning. The error reward signal encourages the exploration of unfamiliar parts of the state-action space which are not yet wellpredictable.…”
Section: Related Workmentioning
confidence: 99%
“…This is often done by providing intrinsic motivation via a self-derived reward, resulting in "curiosity-driven" behaviour [8,4]. Such approaches include surprise [1,5] (where experiencing unexpected dynamics is rewarded), and empowerment [40,20] (where the agent prefers states in which it has more control).…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, we can show that the formulation of the metric learning loss is susceptible to embedding explosion if the representation space is left unconstrained 2 . In our work, we build upon the DBC model in an attempt to tackle both problems: (1) we address embedding explosion by stabilizing the state representation space via a norm constraint and (2) we prevent embedding collapse by altering the encoder training method.…”
Section: Introductionmentioning
confidence: 99%
“…Exploration in RL: Exploration is one of the most important issues in model-free RL, as there is the key assumption that all state-action pairs must be visited infinitely often to guarantee the convergence of Q-function [56]. In order to explore diverse state-action pairs in the joint state-action space, various methods have been considered in prior works: intrinsically-motivated reward based on curiosity [5,11], model prediction error [1,10], information gain [26,28,29], and counting states [33,35]. These exploration techniques improve exploration and performance in challenging sparse-reward environments [3,10,13].…”
Section: Related Workmentioning
confidence: 99%
“…Starting from the left-lower corner (0.5, 0.5), the agent explores the maze without any external reward. First, note that for this pure exploration task, the optimal policy maximizing J M axEnt (π) is given by the uniform policy that selects all actions in 1] uniformly regardless of the value of s t . This is because the uniform distribution has maximum entropy for a bounded space [14].…”
Section: Saturationmentioning
confidence: 99%