The authors present their primary value learned value (PVLV) model for understanding the rewardpredictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variability in the environment. The primary value (PV) system controls performance and learning during primary rewards, whereas the learned value (LV) system learns about conditioned stimuli. The PV system is essentially the Rescorla-Wagner/delta-rule and comprises the neurons in the ventral striatum/nucleus accumbens that inhibit DA cells. The LV system comprises the neurons in the central nucleus of the amygdala that excite DA cells. The authors show that the PVLV model can account for critical aspects of the DA firing data, making a number of clear predictions about lesion effects, several of which are consistent with existing data. For example, first-and second-order conditioning can be anatomically dissociated, which is consistent with PVLV and not TD. Overall, the model provides a biologically plausible framework for understanding the neural basis of reward learning.Keywords: basal ganglia, dopamine, reinforcement learning, Pavlovian conditioning, computational modeling An important and longstanding challenge for both the cognitive neuroscience and artificial intelligence communities has been to develop an adequate understanding (and a correspondingly robust model) of Pavlovian learning. Such a model should account for the full range of signature findings in the rich literature on this phenomenon. Pavlovian conditioning refers to the ability of previously neutral stimuli that reliably co-occur with primary rewards to elicit new conditioned behaviors and to take on reward value themselves (e.g., Pavlov's famous case of the bell signaling food for hungry dogs; Pavlov, 1927).Pavlovian conditioning is distinguished from instrumental conditioning in that the latter involves the learning of new behaviors that are reliably associated with reward, either first order (US), or second order (CS). Although Pavlovian conditioning also involves behaviors (conditioned and unconditioned responses), reward delivery is not contingent on behavior but is instead reliably paired with a stimulus regardless of behavior. In contrast, instrumental conditioning explicitly makes reward contingent on a particular "operant" or "instrumental" response. Both stimulus-reward (Pavlovian) and stimulus-response-reward (instrumental) associations, however, are thought to be trained by the same phasic dopamine signal that occurs at the time of primary reward (US) as described below. In practice, the distinction is often blurry as the two types of conditioning interact (e.g., second-order instrumental conditioning and so-called Pavlovian instrumental transfer effects).The dominant theoretical perspective for both Pavlovian and instrumental conditioning since the seminal Rescorla and Wagner (1972) model, is that learning is based on the discrepancy between act...