Dopamine and Reward Prediction ErrorsHow does the brain learn what feels good? How does it adapt to new foods, bad bets, and played-out songs? How is it tricked into wanting ever-increasing doses of heroin or cocaine? Is there a simple rule that links neural signaling to the fundamental approach and avoidance responses on which more complex behavior is built?It is now almost 20 y since Schultz, Montague, and Dayan (1) gave a provocative answer to these questions, an answer that is now recognized as one of the cornerstones of reinforcement learning and neuroeconomics. In a series of prior papers (reviewed in ref.2), Schultz and collaborators had demonstrated that dopamine-releasing cells in the ventralThe findings of Kishida et al. directly implicate dopamine in counterfactual learning, regret, and disappointment. midbrain respond to the delivery of rewards with a burst of action potentials, that these cells also respond to cues predicting rewards, and that over the course of learning, dopamine responses shift from the delivery of the reward to the predictive cue itself. Schultz, Montague, and Dayan proposed a simple, but powerful, computational account of these findings: Dopamine encodes a key variable posited by theories of reinforcement learning. These theories posit that animals select behaviors on the basis of which ones they expect to result in reward, updating their beliefs on the difference between expectations and observed outcomes, good or bad (3). This reward prediction error is large when rewards are unexpected and small when rewards are fully predicted, and its magnitude and sign drive the speed and direction of learning, respectively.This hypothesis proved seminal in several ways. First, it linked a particular neural signal to a computational model, thus permitting the development of quantitative predictions about the physiology of reward and learning. Second, because that computational model was a model of both learning and choice, it inspired subsequent studies probing the neurobiology of decision making. Finally, because many drugs of abuse directly affect dopamine function, the model provided a mathematical explanation for addiction. If dopamine encodes a prediction error, and if more dopamine signals that a reward is better than predicted, then every dose of cocaine or methamphetamine is more rewarding than expected, and the cues associated with these chemicals become powerful motivators for drug-seeking behavior. In other words, addiction is a normal learning process gone awry (4).
Human Dopamine Recordings Challenge the Reward Prediction Error HypothesisThe dopamine prediction error finding has been endorsed by dozens of neurobiological studies in animals, and indirectly supported by brain imaging studies in humans (5, 6). In PNAS, Kishida et al. (7) build upon this work by directly measuring dopamine release in the human striatum. Their remarkable findings directly challenge the now classic view that dopamine release simply encodes errors in reward prediction and instead suggest that dopamin...