Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/368
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

Abstract: The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in RL is still an open and little exploited research area. In this paper, we take a brand-new perspective about transfer: we suggest that the ability to assign credit unveils structural invariants in the tasks that can be transferred to make RL more sample-efficient. Our main contribution is SECRET, a novel approach to transfer learning for RL t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…Observations contain the position and velocity of the arm and the gripper, as well as the target position, S = R 13 . Again, we use the dense binary reward signal defined in Eq.…”
Section: Environmentsmentioning
confidence: 99%
“…Observations contain the position and velocity of the arm and the gripper, as well as the target position, S = R 13 . Again, we use the dense binary reward signal defined in Eq.…”
Section: Environmentsmentioning
confidence: 99%
“…Transformers have also seen success at the reward redistribution problem. An alternative model to RUDDER was introduced as SECRET [9]. Rather than using LSTMs, SECRET's reward redistribution model is instead realized as a transformer decoder network [34] which learns to predict the sign of 𝑟 𝑡 at every timestep.…”
Section: Transformermentioning
confidence: 99%
“…However, reward shaping is often difficult because it requires environment-specific knowledge. For single agent problems, frameworks like RUDDER and SECRET [1,9] allow constructing neural network models for reward redistribution; learning how sparse delayed rewards can be transformed into dense rewards for effective policy optimization. Unfortunately, in the multi-agent reinforcement learning (MARL) setting, sparse and delayed rewards have not been explored extensively [13].…”
Section: Introductionmentioning
confidence: 99%
“…task success or high returns). Existing approaches complement or modify RL algorithms by either decomposing observed returns as the sum of redistributed rewards along observed trajectories [Arjona-Medina et al, 2019, Ferret et al, 2019, Hung et al, 2019, Raposo et al, 2021 or incorporating hindsight information into the RL process [Harutyunyan et al, 2019, Ferret et al, 2021, Mesnard et al, 2021. Our approach is related but differs in several points: it is tied to (and aims at making sense of) performance improvements instead of outcomes, and it comes from an abstraction over MDPs (which is non-parametric).…”
Section: Related Workmentioning
confidence: 99%