2020
DOI: 10.1609/aaai.v34i04.5995
|View full text |Cite
|
Sign up to set email alerts
|

How Should an Agent Practice?

Abstract: We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available. During practice, the environment may differ from the one available for training and evaluation with extrinsic rewards. We refer to this setup of alternating periods of practice and objective evaluation as practice-match, drawing an analogy to regimes of skill acquisition common for humans in sports and games. The agent must effectively use p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…In the single-task case, the learned intrinsic reward can help accelerate learning simply by adding them to the task-defining reward [275,132]. Rajendran et al [184] consider a different kind of metalearning setting, where the agent can freely practice in the environment between regular evaluation episodes, with the idea that the most efficient kind of practicing may not be the same as just trying to maximize the task-defining reward. The agent does not have access to the environment reward during the practice episodes and instead optimizes a meta-learned intrinsic reward.…”
Section: Learning Intrinsic Rewardsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the single-task case, the learned intrinsic reward can help accelerate learning simply by adding them to the task-defining reward [275,132]. Rajendran et al [184] consider a different kind of metalearning setting, where the agent can freely practice in the environment between regular evaluation episodes, with the idea that the most efficient kind of practicing may not be the same as just trying to maximize the task-defining reward. The agent does not have access to the environment reward during the practice episodes and instead optimizes a meta-learned intrinsic reward.…”
Section: Learning Intrinsic Rewardsmentioning
confidence: 99%
“…Directly optimizing over these long task horizons is challenging because it can result in vanishing or exploding gradients and has infeasible memory requirements [144]. Instead, as described above, most many-shot meta-RL algorithms adopt a surrogate objective, which considers only one or a few update steps in the innerloop [275,109,229,165,184,266,274,15,230]. These algorithms use either A2C [153]-style [165,274,15,230] or DDPG [125]-style [109] actor-critic objectives in the outer-loop.…”
Section: Auxiliary Tasksmentioning
confidence: 99%
“…In a previous work, Rajendran et al (2020) considered a learning process composed of agnostic pre-training (called a practice) and supervised fine-tuning (a match) in a class of environments. However, in their setting the two phases are alternated, and the supervision signal of the matches allows to learn the reward for the practice through a meta-gradient.…”
Section: Related Workmentioning
confidence: 99%
“…We should note that Rajendran et al [37] also proposed a transfer framework based on intrinsic rewards. In their work, the agent switches between practice episodes -where the agent receives only intrinsic rewards-and match episodes -giving only extrinsic rewards.…”
Section: Learning To Explorementioning
confidence: 99%