Reward prediction errors (RPEs) and risk preferences have two things in common: both can shape decision making behavior, and both are commonly associated with dopamine. RPEs drive value learning and are thought to be represented in the phasic release of striatal dopamine. Risk preferences bias choices towards or away from uncertainty; they can be manipulated with drugs that target the dopaminergic system. Based on the common neural substrate, we hypothesize that RPEs and risk preferences are linked on the level of behavior as well. Here, we develop this hypothesis theoretically and test it empirically. First, we apply a recent theory of learning in the basal ganglia to predict how RPEs influence risk preferences. We find that positive RPEs should cause increased risk-seeking, while negative RPEs should cause risk-aversion. We then test our behavioral predictions using a novel bandit task in which value and risk vary independently across options. Critically, conditions are included where options vary in risk but are matched for value. We find that our prediction was correct: participants become more risk-seeking if choices are preceded by positive RPEs, and more risk-averse if choices are preceded by negative RPEs. These findings cannot be explained by other known effects, such as nonlinear utility curves or dynamic learning rates.
To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.
We provide a spectral norm concentration inequality for infinite random matrices with independent rows. This complements earlier results by Mendelson, Pajor, Oliveira and Rauhut. As an application we study L 2 -norm sampling discretization and recovery of functions in RKHS on D ⊂ R d based on random function samples, where we only assume the finite trace of the kernel. We provide several concrete estimates with precise constants for the corresponding worst-case errors. The fail probability is controlled and decays polynomially in n, the number of samples. In general, our analysis does not need any additional assumptions and also includes the case of kernels on non-compact domains. However, under the mild additional assumption of separability we observe improved rates of convergence.
9Reinforcement learning theories propose that humans choose based on the estimated values of 10 available options, and that they learn from rewards by reducing the difference between the experienced 11 and expected value. In the brain, such prediction errors are broadcasted by dopamine. However, choices 12 are not only influenced by expected value, but also by risk. Like reinforcement learning, risk preferences 13 are modulated by dopamine: enhanced dopamine levels induce risk-seeking. Learning and risk 14 preferences have so far been studied independently, even though it is commonly assumed that they are 15 (partly) regulated by the same neurotransmitter. Here, we use a novel learning task to look for 16 prediction-error induced risk-seeking in human behavior and pupil responses. We find that prediction 17 errors are positively correlated with risk-preferences in imminent choices. Physiologically, this effect is 18 indexed by pupil dilation: only participants whose pupil response indicates that they experienced the 19 prediction error also show the behavioral effect. 20Reward-guided learning in humans and animals can often be modelled simply as reducing the difference 22 between the obtained and expected reward-a reward prediction error. This well-established 23 behavioral phenomenon [Rescorla, 1972] has been linked to the neurotransmitter dopamine [Schultz, 24 1997]. It has been shown that bursts of dopaminergic activity broadcast prediction errors to brain areas 25 that are relevant for reward learning, such as the striatum, the amygdala, and the prefrontal cortex. 26Another behavioral phenomenon that has been well studied is the effect of uncertainty and risk on 27 decision making [Kahneman, 2013]. Here again, a different line of research has established an 28 association between dopamine and risk-taking: dopamine-enhancing medication has been shown to 29 increase risk-seeking in rats [St Onge, 2009], and drive excessive gambling when used to treat 30Parkinson's disease [Voon, 2006] [Gallagher, 2007] [Weintraub, 2010]. More recently, it has been 31 demonstrated that phasic responses in dopaminergic brain areas modulate moment-by-moment risk-32 preference in humans: the tendency to take risks correlated positively with the magnitude of task-33 related dopamine responses [Chew, 2019]. A family of mechanistic theories of the basal ganglia network 34 provides an explanation for these risk effects [Mikhael, 2016] [Moeller, 2019]. According to these 35 models, positive and negative outcomes of actions are encoded separately in direct and indirect 36 pathways of the basal ganglia. The balance between those pathways is controlled by the dopamine 37level. An increased dopamine level promotes the direct pathway and hence puts emphasis on potential 38 gains, thus rendering risky options more attractive. 39In summary, dopamine bursts are related to distinct behavioral phenomena-learning and risk-taking-40 by way of 1) acting as reward prediction errors, affecting synaptic weights during reinforcement 41 learnin...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.