2023
DOI: 10.1101/2023.11.12.566754
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-timescale reinforcement learning in the brain

Paul Masset,
Pablo Tano,
HyungGoo R. Kim
et al.

Abstract: To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2–6and at characterizing the firing of dopamine neurons in the midbrain7–9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 83 publications
1
2
0
Order By: Relevance
“…The transition and observation matrix, which are used to compute the probability of each state, were derived from the experimental settings, assuming a fixed probability of transition from the Wait to Pre state, modeling a growing anticipation of the next trial beginning. Using this state representation improved the quantitative accuracy of the model for a given γ versus the Cue-Context, and accurately predicted the experimental data at a value of consistent with previously reported results 4649 (Fig 3f., Extended Data Fig. 4).…”
Section: Resultssupporting
confidence: 84%
See 1 more Smart Citation
“…The transition and observation matrix, which are used to compute the probability of each state, were derived from the experimental settings, assuming a fixed probability of transition from the Wait to Pre state, modeling a growing anticipation of the next trial beginning. Using this state representation improved the quantitative accuracy of the model for a given γ versus the Cue-Context, and accurately predicted the experimental data at a value of consistent with previously reported results 4649 (Fig 3f., Extended Data Fig. 4).…”
Section: Resultssupporting
confidence: 84%
“…By increasing the context and thus value during the pre-Odor ITI period, the TD error at Odor A is diminished. While this produces a qualitatively correct pattern of results, it requires a temporal discount factor that is well below previously reported values 4649 to produce the quantitatively correct pattern (Extended Data Fig. 4).…”
Section: Resultsmentioning
confidence: 57%
“…Though recent work has shown that DA in the tail of the striatum elicited by cue-evoked action execution decreases over time 40 , supporting the idea that action responses are modulated by predictability rather than simple motor correlates, other experiments must still be conducted in order to establish that DA neurons encode an APE fully analogous to the TD RPE. Notably, a TD prediction error transfers with learning from the predicted outcome to cues predicting it (at least depending on the time discount parameter, which might also vary between circuits 26,99,100 ). By analogy, a TD APE predicts that cues that elicit actions should produce DA signals that increase with training, and that “omission” of an action predicted by a cue (say, by introduction of a no-go signal) should yield a reduction in DA.…”
Section: Discussionmentioning
confidence: 99%