2019
DOI: 10.48550/arxiv.1912.05500
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What Can Learned Intrinsic Rewards Capture?

Abstract: Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…The optimal reward framework [60,63] and shaped rewards [47] (if generated by the agent itself) also consider intrinsic motivation as a way to assist an RL agent in learning the optimal policy for a given task. Such an intrinsically motivated reward signal has previously been learned through various methods such as evolutionary techniques [49,57], meta-gradient approaches [62,72,73], and others. The Wasserstein distance has been used to present a valid reward for imitation learning [70,17] as well as program synthesis [24].…”
Section: Intrinsic Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…The optimal reward framework [60,63] and shaped rewards [47] (if generated by the agent itself) also consider intrinsic motivation as a way to assist an RL agent in learning the optimal policy for a given task. Such an intrinsically motivated reward signal has previously been learned through various methods such as evolutionary techniques [49,57], meta-gradient approaches [62,72,73], and others. The Wasserstein distance has been used to present a valid reward for imitation learning [70,17] as well as program synthesis [24].…”
Section: Intrinsic Motivationmentioning
confidence: 99%
“…Unfortunately, the optimal policy under such modified rewards might sometimes be different than the optimal policy under the task reward [47,16]. The problem of learning a reward signal that speeds up learning by communicating what to do but does not interfere by specifying how to do it is thus a useful and complex one [73].…”
Section: Introductionmentioning
confidence: 99%
“…The optimal reward framework [33,35] and shaped rewards [23] (if generated by the agent itself) also consider intrinsic motivation as a way to assist an RL agent in learning the optimal policy for a given task. Such an intrinsically motivated reward signal has previously been learned through various methods such as evolutionary techniques [24,30], meta-gradient approaches [34,39,40], and others. The Wasserstein distance, in particular, has been used to present a valid reward for speeding up learning of goal-conditioned policies [11], imitation learning [37,10,38], as well as program synthesis [15].…”
Section: Related Workmentioning
confidence: 99%
“…Another core problem of continual RL that meta-learning can potentially help with is exploration. Meta-learning has been repeatedly used in the recent literature to learn functions for intrinsic motivation and improved exploration (Baranes and Oudeyer, 2009;Zheng et al, 2018;Xu et al, 2018a;Yang et al, 2019;Zou et al, 2019;Zheng et al, 2019).…”
Section: Learning To Explorementioning
confidence: 99%