Annotated datasets for automatic white balance (AWB) are used for the evaluation and, when necessary, the training, of AWB methods. Relying on such datasets requires awareness of the potential bias in their content and characteristics: some methods are designed to rely on the presence of particular elements, such as human skin, while other methods learn implicit relationships between image content and light properties from training data. The dependency on these relationships makes it fundamental to understand whether the available datasets are actually representative of common application scenarios, such as the presence of human subjects, the diversity of composition, or the illumination conditions. In this paper we overview the most common datasets for Automatic White Balance, including those for single as well as multiple illuminant estimation, providing a critical analysis on their characteristics. Furthermore, we identify a number of existing methods for single illuminant estimation, as a representative pool of approaches to the problem with various levels of complexity. We investigate how the performance of these correlate to the image content of common datasets.