In climate science, observational gridded climate datasets that are based on in situ measurements serve as evidence for scientific claims and they are used to both calibrate and evaluate models. However, datasets only represent selected aspects of the real world, so when they are used for a specific purpose they can be a source of uncertainty. Here, we present a framework for understanding this uncertainty of observational datasets which distinguishes three general sources of uncertainty: (1) uncertainty that arises during the generation of the dataset; (2) uncertainty due to biased samples; and (3) uncertainty that arises due to the choice of abstract properties, such as resolution and metric. Based on this framework, we identify four different types of dataset ensembles—parametric, structural, resampling, and property ensembles—as tools to understand and assess uncertainties arising from the use of datasets for a specific purpose. We advocate for a more systematic generation of dataset ensembles by using these sorts of tools. Finally, we discuss the use of dataset ensembles in climate model evaluation. We argue that a more systematic understanding and assessment of dataset uncertainty is needed to allow for a more reliable uncertainty assessment in the context of model evaluation. The more systematic use of such a framework would be beneficial for both scientific reasoning and scientific policy advice based on climate datasets.
This article is categorized under:
Paleoclimates and Current Trends > Modern Climate Change