Accurately decoding external variables from observations of neural activity is a major challenge in systems neuroscience. Bayesian decoders, that provide probabilistic estimates, are some of the most widely used. Here we show how, in many common settings, the probabilistic predictions made by traditional Bayesian decoders are overconfident. That is, the estimates for the decoded stimulus or movement variables are more certain than they should be. We then show how Bayesian decoding with latent variables, taking account of low-dimensional shared variability in the observations, can improve calibration, although additional correction for overconfidence is still needed. We examine: 1) decoding the direction of grating stimuli from spike recordings in primary visual cortex in monkeys, 2) decoding movement direction from recordings in primary motor cortex in monkeys, 3) decoding natural images from multi-region recordings in mice, and 4) decoding position from hippocampal recordings in rats. For each setting we characterize the overconfidence, and we describe a possible method to correct miscalibration post-hoc. Properly calibrated Bayesian decoders may alter theoretical results on probabilistic population coding and lead to brain machine interfaces that more accurately reflect confidence levels when identifying external variables.Significance StatementBayesian decoding is a statistical technique for making probabilistic predictions about external stimuli or movements based on recordings of neural activity. These predictions may be useful for robust brain machine interfaces or for understanding perceptual or behavioral confidence. However, the probabilities produced by these models do not always match the observed outcomes. Just as a weather forecast predicting a 50% chance of rain may not accurately correspond to an outcome of rain 50% of the time, Bayesian decoders of neural activity can be miscalibrated as well. Here we identify and measure miscalibration of Bayesian decoders for neural spiking activity in a range of experimental settings. We compare multiple statistical models and demonstrate how overconfidence can be corrected.