Learning is difficult when the world fluctuates randomly and ceaselessly. Classical learning algorithms, such as the delta rule with constant learning rate, are not optimal. Mathematically, the optimal learning rule requires weighting prior knowledge and incoming evidence according to their respective reliabilities. This "confidence weighting" implies the maintenance of an accurate estimate of the reliability of what has been learned. Here, using fMRI and an idealobserver analysis, we demonstrate that the brain's learning algorithm relies on confidence weighting. While in the fMRI scanner, human adults attempted to learn the transition probabilities underlying an auditory or visual sequence, and reported their confidence in those estimates. They knew that these transition probabilities could change simultaneously at unpredicted moments, and therefore that the learning problem was inherently hierarchical. Subjective confidence reports tightly followed the predictions derived from the ideal observer. In particular, subjects managed to attach distinct levels of confidence to each learned transition probability, as required by Bayes-optimal inference. Distinct brain areas tracked the likelihood of new observations given current predictions, and the confidence in those predictions. Both signals were combined in the right inferior frontal gyrus, where they operated in agreement with the confidenceweighting model. This brain region also presented signatures of a hierarchical process that disentangles distinct sources of uncertainty. Together, our results provide evidence that the sense of confidence is an essential ingredient of probabilistic learning in the human brain, and that the right inferior frontal gyrus hosts a confidence-based statistical learning algorithm for auditory and visual sequences.T he sensory data that we receive from our environment are often captured by temporal regularities-for instance the colors of traffic lights change according to a predictable greenyellow-red pattern; thunder is often followed by rain, etc. Knowledge of those hidden regularities is often acquired through learning, by aggregating successive observations into summary estimates (e.g., the probability of the light turning red when it is currently yellow). When sensory data are received sequentially, learning can be described as an iterative process that updates the internal estimates each time a new observation is received. Learners must therefore constantly balance two sources of information: their current estimates and the new incoming observations. Any learning algorithm must find a solution to this balancing act. Finding the correct balance is especially critical in a world that is both stochastic and changing (1), i.e., where observations are governed by probabilities that can change over time (a situation called volatility). An excessive reliance on incoming observations will make the learned estimates dominated by fluctuations instead of converging to the true underlying probabilities. Conversely, an excessive reliance o...