2019
DOI: 10.1101/653493
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reward-predictive representations generalize across tasks in reinforcement learning

Abstract: One question central to reinforcement learning is which representations can be generalized or re-used across different tasks. Existing algorithms that facilitate transfer typically are limited to cases in which the transition function or the optimal policy is portable to new contexts, but achieving "deep transfer" characteristic of human behavior has been elusive. This article demonstrates that model reductions that minimize error in predictions of reward outcomes generalize across tasks with different transit… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
26
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(26 citation statements)
references
References 36 publications
0
26
0
Order By: Relevance
“…Based on our model ( Fig. 7 ) and the general principle of reward/effort maximization 57,58 , we propose the following explanation for our neurophysiological data. Our previous results suggest that the FEF and SEF continue to show an eye-centered target-relative-to-eye to gaze-relative-to-eye transformation for saccades in the presence of a landmark 15,16 , but their visual signals are influenced by landmarks in a fashion that depends on target-landmark configuration 32 .…”
Section: Discussionmentioning
confidence: 85%
See 1 more Smart Citation
“…Based on our model ( Fig. 7 ) and the general principle of reward/effort maximization 57,58 , we propose the following explanation for our neurophysiological data. Our previous results suggest that the FEF and SEF continue to show an eye-centered target-relative-to-eye to gaze-relative-to-eye transformation for saccades in the presence of a landmark 15,16 , but their visual signals are influenced by landmarks in a fashion that depends on target-landmark configuration 32 .…”
Section: Discussionmentioning
confidence: 85%
“…Note that individual trials are directed pseudorandomly (as in our data) but the overall gaze distribution maximizes reward across trials. Overall, this strategy maximizes reward outcome based on visual cues and their link to expected probabilistic events 57,58 . In lay terms, the model makes an educated ‘guess’.…”
Section: Discussionmentioning
confidence: 99%
“…Previous work has also cast the learning of latent task components and structures as a problem of nonparametric Bayesian inference [22, 10, 11, 12,13,31], and our work continues in this same vein; but Bayesian inference must now be performed over both individual latent components and a potentially infinite variety of structures composed of these components. Bayesian non-parametrics also offers one possible solution to two of the challenges inherent in continual learning.…”
Section: Discussionmentioning
confidence: 92%
“…On a higher level of abstraction, high-dimensional MDPs, as one finds in sensory-rich naturalistic tasks, can often be compressed to discover regions of state spaces that are equivalent for the sake of planning. Discovery of such lower dimensional latent states allows an agent to transfer to novel MDPs with entirely novel reward and transition functions, as long as they retain the abstract latent structure [13]. Indeed, this work showed that one could discover useful abstractions in a guitar playing task (e.g., that all fret positions that yield a “C” note are equivalent) that allows an agent to much more rapidly learn new scales.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation