52Dopamine (DA) neurons of the ventral tegmental area (VTA) track external cues and 53 rewards to generate a reward prediction error (RPE) signal during Pavlovian conditioning. Here 54 we explored how RPE is implemented for a self-paced, operant task in freely moving mice. The 55 animal could trigger a reward-predicting cue by remaining in a specific location of an operant 56 box for a brief time before moving to a spout for reward collection. In vivo single-unit recordings 57 revealed phasic responses to the cue and reward in correct trials, while with failures the activity 58 paused, reflecting positive and negative error signals of a reward prediction. In addition, a 59 majority of VTA DA neurons also encoded parameters of the goal-directed action (e.g. 60 movement velocity, acceleration, distance to goal and licking) by changes in tonic firing rate. 61 Such multiplexing of individual neurons was only apparent while the mouse was engaged in 62 the task. We conclude that a multiplexed internal representation during the task modulates VTA 63 DA neuron activity, indicating a multimodal prediction error that shapes behavioral adaptation 64 of a self-paced goal-directed action. 65 66 93 19 . In brief, mice had to find an unmarked "trigger zone" in the operant box and remain there 94 two seconds, which would activate a light cue. Once the cue was presented, the animal had 95 four seconds to collect the reward by licking for a drop of fat solution at a spout located at the 96 other side of the box. The animal could engage in the next trial at its own pace. We performed 97 single-unit recordings of the VTA and video recorded the movement of the mouse. We found 98 that VTA DA neurons multiplexed phasic responses to salient events with tonic activity 99 reflecting parameters of the motor output. 100 101 102 4
Results
103Operant spatial task and behavioral performance. We injected a virus expressing cre-104 dependent channelrhodopsin (ChR2) and implanted a 16-channel optrode mounted into a 105 microdrive into the VTA of DAT-Cre mice (Fig. 1a). After recovery, we started the pre-training 106 phase that lasted five to ten days where the mice were conditioned in a cue-reward paradigm 107 (Fig.1b,c). The mice learned to associate a randomly occurring 4s light stimulus (cue) with the 108 availability of a drop of a fat solution (5% of lipofundin, BBraun, Sempach, Switzerland). We 109 then switched to the cue-guided spatial navigational task for five to twenty days ( Fig. 1c bottom 110 timeline), where the cue was triggered once the mouse had spent 2s in a small (4x4cm), 111 unmarked trigger zone (TZ) of the operant chamber (grey dotted square in Fig.1b). To collect 112 the reward the mouse had to move to the other end of the box. 113 Within a few sessions, all mice found the TZ and the reward rate increased whereas 114 the median inter-reward interval decreased in the first days ( Fig. 1f and Fig. S1). Since the 115 median inter-reward interval was insensitive to slow initiation or occasional breaks, we chose i...