2021
DOI: 10.1101/2021.06.14.448422
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributional reinforcement learning in prefrontal cortex

Abstract: Prefrontal cortex is crucial for learning and decision-making. Classic reinforcement learning (RL) theories centre on learning the expectation of potential rewarding outcomes and explain a wealth of neural data in prefrontal cortex. Distributional RL, on the other hand, learns the full distribution of rewarding outcomes and better explains dopamine responses. Here we show distributional RL also better explains prefrontal cortical responses, suggesting it is a ubiquitous mechanism for reward-guided learning.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 37 publications
0
3
0
Order By: Relevance
“…Numerous theoretical models currently exist which attempt to provide normative descriptions of this role of the PFC, often from the perspective of reinforcement learning, meta-learning, or both (J. X. Wang et al, 2018; Muller et al, 2024).…”
Section: Discussionmentioning
confidence: 99%
“…Numerous theoretical models currently exist which attempt to provide normative descriptions of this role of the PFC, often from the perspective of reinforcement learning, meta-learning, or both (J. X. Wang et al, 2018; Muller et al, 2024).…”
Section: Discussionmentioning
confidence: 99%
“…The CPD was estimated at each timepoint for each neuron within a region and the average across all neurons per region was plotted over time. Furthermore, the CPD values were averaged within a larger 300 millisecond time window locked to choice onset, corresponding to conventional time windows used previously to identify value-related representations emerging in these four regions 1,4…”
Section: Methodsmentioning
confidence: 99%
“…Temporal difference learning always converges to the true stimulus values; however, people often deviate from this linear learning trajectory in important ways. For example, there are asymmetric learning rates for rewarding vs. punishing stimuli (Muller et al, 2021), the value function to be learned can be non-linear and learning itself is often distributed (François-Lavet et al, 2018). Further, when the number of states (e.g., a spatial location, a physiological state of being, or a mental state of mind) is large the time it takes to learn the value function is infeasible for animals.…”
Section: Modeling Exploitationmentioning
confidence: 99%