Learning to associate unambiguous sensory cues with rewarded choices is known to be mediated by dopamine (DA) neurons. However, little is known about how these neurons behave when choices rely on uncertain reward-predicting stimuli. To study this issue we reanalyzed DA recordings from monkeys engaged in the detection of weak tactile stimuli delivered at random times and formulated a reinforcement learning model based on belief states. Specifically, we investigated how the firing activity of DA neurons should behave if they were coding the error in the prediction of the total future reward when animals made decisions relying on uncertain sensory and temporal information. Our results show that the same signal that codes for reward prediction errors also codes the animal's certainty about the presence of the stimulus and the temporal expectation of sensory cues.dopamine activity | perception | temporal expectation | decision making | reinforcement learning W hen an inexperienced animal hears a soft rustle in the nearby foliage, it does not associate this cue with the escaping prey that it observes immediately after. How does the animal get to learn that the correct action to take is to approach it and try to get it? In perceptual decision-making experiments, animals learn how to make decisions based on their perception of weak sensory stimuli, receiving a reward for their correct choices, which they are taught to communicate by means of a specific motor action (1-7). The learning of these tasks is presumably mediated by the activity of midbrain dopamine (DA) neurons (8). Although DA recordings made while animals are engaged in making such difficult decisions are scarce, experiments on Pavlovian and instrumental conditioning have shown that under a novel stimulus-reward association, DA neurons respond to the unexpected reward with an activity burst. Remarkably, after training this phasic response is shifted to the conditioned stimulus where it works as a signal predicting the future reward (8-12). From a computational standpoint, reinforcement learning (RL) methods (13) have been successfully applied to explain this and many other observations (ref. 14 and for reviews see refs. 15-17). According to the reward prediction error (RPE) hypothesis (18,19), the DA phasic activity signals an error in the prediction of the expected total reward (20)(21)(22) and it is used to learn associations between rewards and task events.In classical and instrumental conditioning the reward acts as a reinforcement, strengthening the association with the stimulus, provided the animal follows the task instructions. In some experiments the reward was delivered only after the animal made a choice between alternative options (20,23,24). However, in those studies the task events were unambiguous: The animals' reports were mostly correct and there was a well-defined temporal relationship between the perceived stimulus and reward delivery. However, this is very different from the real-world situation described above in which the reward is annou...