2022
DOI: 10.1371/journal.pcbi.1009816
|View full text |Cite
|
Sign up to set email alerts
|

Uncertainty–guided learning with scaled prediction errors in the basal ganglia

Abstract: To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 47 publications
0
14
0
Order By: Relevance
“…by investigating whether the outcome of an “explore” trial (a trial on which the option with lower Q value is chosen, likely to be associated with higher novelty signal) is statistically more influential on the outcome of the next trial (suggesting a higher learning rate). Within the model framework of reinforcement learning through direct and indirect striatal pathways, Möller et al [23] had a different take on modulated belief updating, which considers the circuit dynamics at the time of reward presentation and predicts that the reward prediction error itself should be scaled by the estimated spread of the reward distribution (Equation 12). Theoretically, this could be combined with the learning rate modulation by novelty, and from a physiological perspective, the novelty signal should take effect on the target striatal neurons before reward presentation, whereas the dynamics that leads to the scaled prediction error signal occurs after reward presentation.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…by investigating whether the outcome of an “explore” trial (a trial on which the option with lower Q value is chosen, likely to be associated with higher novelty signal) is statistically more influential on the outcome of the next trial (suggesting a higher learning rate). Within the model framework of reinforcement learning through direct and indirect striatal pathways, Möller et al [23] had a different take on modulated belief updating, which considers the circuit dynamics at the time of reward presentation and predicts that the reward prediction error itself should be scaled by the estimated spread of the reward distribution (Equation 12). Theoretically, this could be combined with the learning rate modulation by novelty, and from a physiological perspective, the novelty signal should take effect on the target striatal neurons before reward presentation, whereas the dynamics that leads to the scaled prediction error signal occurs after reward presentation.…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, a continuous graduate shift in the reward distribution would be more difficult to optimise for. The learning rule with scaled reward prediction error proposed by Möller et al [23] is beneficial when the spread of the reward distribution (“noisiness” of the reward) is variable, but not when drastic changes in the mean reward occur. It would be interesting to further investigate the learning dynamics and the resulting effect on exploration modulation in these scenarios with this alternative learning rule, potentially also combined with the dynamic learning rate we used in this study.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…(2) Side-effect models It has been suggested that risk sensitivity is a side-effect of either the general properties of mechanisms for learning or a particular mechanism for assessing rewards (Kacelnik & Bateson, 1996March, 1996;McNamara, 1996;Niv et al, 2002;Buchkremer & Reinhold, 2010;Kacelnik & El Mouden, 2013). If this claim is true, then attempts to view risk-sensitive behaviour as an adaptation to uncertainty are misguided.…”
Section: Causal Models (1) Neurocognitive Modelsmentioning
confidence: 99%
“…In experiments on RSF, non-human animals have to learn about the options. It is known that learning can result in risk-sensitive behaviour (Regelmann, 1986;March, 1996;Niv et al, 2002;Buchkremer & Reinhold, 2010). Our focus in this section is the following argument that predicts risk aversion from general properties of learning (Kacelnik & Bateson, 1996Bateson & Kacelnik, 1998;Kacelnik & El Mouden, 2013): 'Given the concave relation between reinforcement effects and reward size and the convex relation between reinforcement effects and delay, Jensen's inequality implies that variance in amount should have a negative impact on reinforcement and variance in delay a positive one.…”
Section: (B) Learningmentioning
confidence: 99%