Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain

Niv, Yaron; Edlund, Jeffrey A.; Dayan, Peter; O’Doherty, John P.

doi:10.1523/jneurosci.5498-10.2012

Cited by 349 publications

(522 citation statements)

References 75 publications

Supporting

Mentioning

473

Contrasting

Unclassified

Order By: Relevance

“…The expected value was equivalent for the two targets on all trials; however, risk, defined in terms of hit probability, was not. Under such conditions, people tend to be risk-averse (2,6).…”

Section: Resultsmentioning

confidence: 99%

“…In experiment 1, participants were assigned to one of three conditions (n = 20/group). In the Standard condition, choices were indicated by pressing one of two keys, the typical response method in bandit tasks (1,2). Points were only earned on hit trials Significance Thorndike's Law of Effect states that when an action leads to a desirable outcome, that action is likely to be repeated.…”

Section: Resultsmentioning

confidence: 99%

“…To explore how movement errors might influence choice behavior, we examined several variants of a temporal difference (TD) reinforcement learning model (1,2,11). Current variants treat motor-related variables as factors influencing subjective utility (4).…”

Section: Resultsmentioning

confidence: 99%

“…Seven individuals had an identified genetic subtype, and five were of unknown etiology (Table S1). Two patients diagnosed with spinocerebellar ataxia type 3 (SCA-3) were excluded from the final analysis given that phenotypes of SCA-3 may also show degeneration and/or dysfunction of the basal ganglia (29), a region strongly implicated in reinforcement learning (1,2,21,25). (We note that the choice behavior of these two individuals was similar to that observed in the other 10 individuals with cerebellar degeneration.)…”

Section: Methodsmentioning

confidence: 99%

“…Humans are highly capable of tracking the value of stimuli, varying their behavior on the basis of reinforcement history (1,2), and exhibiting sensitivity to intrinsic motor noise when reward outcomes depend on movement accuracy (3)(4)(5). In real-world behavior, the underlying cause of unrewarded events is often ambiguous: A lost point in tennis could occur because the player made a poor choice about where to hit the ball or failed to properly execute the stroke.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Credit assignment in movement-dependent reinforcement learning

McDougle

Boggess

Crossley

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.decision-making | reinforcement learning | sensory prediction error | reward prediction error | cerebellum W hen a diner reaches across the table and knocks over her coffee, the absence of anticipated reward should be attributed to a failure of coordination rather than diminish her love of coffee. Although this attribution is intuitive, current models of decision-making lack a mechanistic explanation for this seemingly simple computation. We set out to ask if, and how, selection processes in decision-making incorporate information specific to action execution and thus solve the credit assignment problem that arises when an expected reward is not obtained because of a failure in motor execution.Humans are highly capable of tracking the value of stimuli, varying their behavior on the basis of reinforcement history (1, 2), and exhibiting sensitivity to intrinsic motor noise when reward outcomes depend on movement accuracy (3-5). In real-world behavior, the underlying cause of unrewarded events is often ambiguous: A lost point in tennis could occur because the player made a poor choice about where to hit the ball or failed to properly execute the stroke. However, in laboratory studies of reinforcement learning, the underlying cause of unrewarded events is typically unambiguous, either solely dependent on properties of the stimulus or on motor noise. Thus, it remains unclear how people assign credit to either extrins...

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Credit assignment in movement-dependent reinforcement learning

McDougle

Boggess

Crossley

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

Role of the Medial Prefrontal Cortex in Impaired Decision Making in Juvenile Attention-Deficit/Hyperactivity Disorder

Iannaccone²,

et al. 2014

View full text Add to dashboard Cite

IMPORTANCE Attention-deficit/hyperactivity disorder (ADHD) has been associated with deficient decision making and learning. Models of ADHD have suggested that these deficits could be caused by impaired reward prediction errors (RPEs). Reward prediction errors are signals that indicate violations of expectations and are known to be encoded by the dopaminergic system. However, the precise learning and decision-making deficits and their neurobiological correlates in ADHD are not well known. OBJECTIVE To determine the impaired decision-making and learning mechanisms in juvenile ADHD using advanced computational models, as well as the related neural RPE processes using multimodal neuroimaging. DESIGN, SETTING, AND PARTICIPANTS Twenty adolescents with ADHD and 20 healthy adolescents serving as controls (aged 12-16 years) were examined using a probabilistic reversal learning task while simultaneous functional magnetic resonance imaging and electroencephalogram were recorded. MAIN OUTCOMES AND MEASURES Learning and decision making were investigated by contrasting a hierarchical Bayesian model with an advanced reinforcement learning model and by comparing the model parameters. The neural correlates of RPEs were studied in functional magnetic resonance imaging and electroencephalogram. RESULTS Adolescents with ADHD showed more simplistic learning as reflected by the reinforcement learning model (exceedance probability, P x = .92) and had increased exploratory behavior compared with healthy controls (mean [SD] decision steepness parameter β: ADHD, 4.83 [2.97]; controls, 6.04 [2.53]; P = .02). The functional magnetic resonance imaging analysis revealed impaired RPE processing in the medial prefrontal cortex during cue as well as during outcome presentation (P < .05, family-wise error correction). The outcome-related impairment in the medial prefrontal cortex could be attributed to deficient processing at 200 to 400 milliseconds after feedback presentation as reflected by reduced feedback-related negativity (ADHD, 0.61 [3.90] μV; controls, −1.68 [2.52] μV; P = .04). CONCLUSIONS AND RELEVANCE The combination of computational modeling of behavior and multimodal neuroimaging revealed that impaired decision making and learning mechanisms in adolescents with ADHD are driven by impaired RPE processing in the medial prefrontal cortex. This novel, combined approach furthers the understanding of the pathomechanisms in ADHD and may advance treatment strategies.

show abstract

Models and Methods for Reinforcement Learning

Dayan

Nakahara

2018

Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience

View full text Add to dashboard Cite

Adaptive behavior requires learning predictions of rewards and punishments, and learning actions that increase the chance or magnitude of the former and avoid the latter. This is the purview of many fields; we consider it in the context of reinforcement learning, which was spawned by behavioral psychology and artificial intelligence, but now also encompasses a wealth of findings in neuroscience. We consider the foundational algorithms of reinforcement learning, such as direct and indirect actors, temporal difference learning, ‐learning and the actor‐critic, discuss aspects of their links to sophisticated ideas from conditioning, and hint at the geography of the main aspects of the terrain that is currently being explored.

show abstract

Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain

Cited by 349 publications

References 75 publications

Credit assignment in movement-dependent reinforcement learning

Credit assignment in movement-dependent reinforcement learning

Role of the Medial Prefrontal Cortex in Impaired Decision Making in Juvenile Attention-Deficit/Hyperactivity Disorder

Models and Methods for Reinforcement Learning

Contact Info

Product

Resources

About