2 Dopamine (DA) neurons in the ventral tegmental area (VTA) are thought to encode reward 3 prediction errors (RPE) by comparing actual and expected rewards. In recent years, much work 4has been done to identify how the brain uses and computes this signal.While several lines of 5 evidence suggest the the interplay of he DA and the inhibitory interneurons in the VTA implements 6 the RPE computaiton, it still remains unclear how the DA neurons learn key quantities, for 7 example the amplitude and the timing of primary rewards during conditioning tasks. Furthermore, 8 exogenous nicotine and endogenous acetylcholine, acting on both VTA DA and GABA (Îł -9 aminobutyric acid) neurons via nicotinic-acetylcholine receptors (nAChRs), also likely affect these 10 computations. To explore the potential circuit-level mechanisms for RPE computations during 11 classical-conditioning tasks, we developed a minimal computational model of the VTA circuitry. 12 The model was designed to account for several reward-related properties of VTA afferents and 13 recent findings on VTA GABA neuron dynamics during conditioning. 14 With our minimal model, we showed that the RPE can be learned by a two-speed process 15 computing reward timing and magnitude. Including a model of nAChR-mediated currents in 16 the VTA DA-GABA circuit, we also showed that nicotine should reduce the acetylcholine action 17 on the VTA GABA neurons by receptor desensitization and therefore potentially boost the DA 18 responses to reward information. Together, our results delineate the mechanisms by which 19 RPE are computed in the brain, and suggest a hypothesis on nicotine-mediated effects on 20 reward-related perception and decision-making. 21 Keywords: dopamine, reward-prediction error, ventral tegmental area, acetylcholine, nicotine 22outcomes (rewards, punishments, etc). The difference between prediction and outcome is the prediction 24 1
Deperrois and GutkinNicotinic Modulation of Dopaminergic Reward Computation Circuitry error, which in turn can serve as a teaching signal to allow the animal to update its predictions and render 25 previously neutral stimuli predictive of rewards into reinforcers of behavior. Particularly, the dopamine 26 (DA) neuron activity in the Ventral Tegmental Area (VTA) have been shown to encode the reward prediction 27 error (RPE), or the difference between the actual reward the animal receives and the expected reward 28 classical conditioning with appetitive rewards, unexpected rewards elicit strong transient increases in VTA 31 DA neuron activity, but as a cue fully predicts the reward, the same reward produces little or no DA neurons 32 response. Finally, after learning, if the reward is omitted, DA neurons pause their firing at the moment 33 reward is expected (Schultz et al., 1997; Schultz, 1998; Keiflin and Janak, 2015; Watabe-Uchida et al., 34 2017). Thus DA neurons should either receive or compute the RPE. While several lines of evidence have 35 pointed towards the RPE being computed by the VTA local circu...