20Decisions must be implemented through actions, and actions are prone to error. As such, when 21 an expected outcome is not obtained, an individual should not only be sensitive to whether the 22 choice itself was suboptimal, but also whether the action required to indicate that choice was 23 executed successfully. The intelligent assignment of credit to action execution versus action 24 selection has clear ecological utility for the learner. To explore this scenario, we used a modified 25 version of a classic reinforcement learning task in which feedback indicated if negative prediction 26 errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction 27 error computations in the human striatum, a key substrate in reinforcement learning and decision 28 making, are modulated when a failure in action execution results in the negative outcome.
29Participants were more tolerant of non-rewarded outcomes when these resulted from execution 30 errors versus when execution was successful but the reward was withheld. Consistent with this 31 behavior, a model-driven analysis of neural activity revealed an attenuation of the signal 32 associated with negative reward prediction error in the striatum following execution failures.
33These results converge with other lines of evidence suggesting that prediction errors in the 34 mesostriatal dopamine system integrate high-level information during the evaluation of 35 instantaneous reward outcomes. 36 37 3 of 35 Introduction 38When a desired outcome is not obtained during instrumental learning, the agent should 39 be compelled to learn why. For instance, if an opposing player hits a home run, a baseball pitcher 40 needs to properly assign credit for the negative outcome: The error could have been in the decision 41 about the chosen action (e.g., throwing a curveball rather than a fastball) or the execution of that 42 decision (e.g., letting the curveball break over the plate rather than away from the hitter, as 43 planned). Here we ask if teaching signals in the striatum, a crucial region for reinforcement 44 learning, are sensitive to this dissociation.
45The striatum is hypothesized to receive reward prediction error (RPE) signals --the 46 difference between received and expected rewards --from midbrain dopamine neurons (Barto, 47 1995; Montague et al., 1996; Schultz et al., 1997). The most common description of an RPE is as a 48 "model-free" error, computed relative to the scalar value of a particular action, which itself reflects 49 a common-currency based on a running average of previous rewards contingent on that action 50 (Langdon et al., 2017). However, recent work suggests that RPE signals in the striatum can also 51 reflect "model-based" information (Daw et al., 2011), where the prediction error is based on an 52 internal simulation of future states. Moreover, human striatal RPEs have been shown to be 53 affected by a slew of cognitive factors, including attention (Leong et al., 2017), episodic memory 54 (Bornstein et al., 2017; Wimme...