To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We provide a normative analysis, comparing the performance of the new model with that of conventional models in a value tracking task. We find that the new model has an advantage over conventional models when tested across various levels of observation noise. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. The scaled prediction error feedback signal is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and the update rules are found to be consistent with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.