2015
DOI: 10.7554/elife.12029
|View full text |Cite
|
Sign up to set email alerts
|

A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

Abstract: Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
76
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(81 citation statements)
references
References 85 publications
5
76
0
Order By: Relevance
“…While the two functions are not mutually exclusive, our data provide strong support for the second interpretation: On a trial-by-trial basis, the degree of ramping across regions was related to the latency to reward peak elicited by the wave, and the combination of ramp slope and wave magnitude was predictive of subsequent-trial behavioral adjustments. These findings accord with views that dopamine signals can have different functions during reward pursuit and outcome, which can be gated by local microcircuit elements that regulate plasticity windows (Berke, 2018;Bradfield et al, 2013;Franklin and Frank, 2015;Morris et al, 2004;Threlfell and Cragg, 2011) . Moreover, we also interpret transient and localized RPEs during reward pursuit as facilitating inference about the current task state (i.e., determining credit), whereas RPEs during reward itself facilitates reinforcement learning; a dual operation that can also be gated (Franklin and Frank, 2015;Gershman et al, 2015;Redish et al, 2007;Schoenbaum et al, 2013) .…”
Section: Discussionsupporting
confidence: 88%
See 1 more Smart Citation
“…While the two functions are not mutually exclusive, our data provide strong support for the second interpretation: On a trial-by-trial basis, the degree of ramping across regions was related to the latency to reward peak elicited by the wave, and the combination of ramp slope and wave magnitude was predictive of subsequent-trial behavioral adjustments. These findings accord with views that dopamine signals can have different functions during reward pursuit and outcome, which can be gated by local microcircuit elements that regulate plasticity windows (Berke, 2018;Bradfield et al, 2013;Franklin and Frank, 2015;Morris et al, 2004;Threlfell and Cragg, 2011) . Moreover, we also interpret transient and localized RPEs during reward pursuit as facilitating inference about the current task state (i.e., determining credit), whereas RPEs during reward itself facilitates reinforcement learning; a dual operation that can also be gated (Franklin and Frank, 2015;Gershman et al, 2015;Redish et al, 2007;Schoenbaum et al, 2013) .…”
Section: Discussionsupporting
confidence: 88%
“…These findings accord with views that dopamine signals can have different functions during reward pursuit and outcome, which can be gated by local microcircuit elements that regulate plasticity windows (Berke, 2018;Bradfield et al, 2013;Franklin and Frank, 2015;Morris et al, 2004;Threlfell and Cragg, 2011) . Moreover, we also interpret transient and localized RPEs during reward pursuit as facilitating inference about the current task state (i.e., determining credit), whereas RPEs during reward itself facilitates reinforcement learning; a dual operation that can also be gated (Franklin and Frank, 2015;Gershman et al, 2015;Redish et al, 2007;Schoenbaum et al, 2013) . Put together, the synthesis of our data and computational simulations imply that dopamine signals are spatio-temporally vectorized during both epochs, tailored to underlying region's computational specialty.…”
Section: Discussionsupporting
confidence: 88%
“…In their model, the estimate of reward probability is updated only if a change is detected and if so, a new estimate of reward probability can be made depending on the location of the detected change. Interestingly, a recent modeling study has shown that increased responsiveness to change-points can be instantiated by pauses in tonically active interneurons in the striatum enabling the modulation of learning rate by reward uncertainty (Franklin and Frank, 2015). Although we did not incorporate a change-detection mechanism, such a mechanism would only improve the performance of our model (Gallistel et al, 2001; McGuire et al, 2014).…”
Section: Discussionmentioning
confidence: 99%
“…This suggests a more complex mechanism in which perseveration is influenced, in part, by the learning rate from negative prediction errors (which can change due to task demand) and by resting levels of DS CHO. Indeed, Franklin et al (2015) showed that a model which takes into account cholinergic activity performs better on a reversal learning task than a model based solely on dopamine prediction error signalling (Franklin & Frank, 2015).…”
Section: Discussionmentioning
confidence: 99%