Abstract. Clustering – the automated grouping of similar data – can
provide powerful and unique insight into large and complex data sets, in a
fast and computationally efficient manner. While clustering has been used in
a variety of fields (from medical image processing to economics), its
application within atmospheric science has been fairly limited to date, and
the potential benefits of the application of advanced clustering techniques
to climate data (both model output and observations) has yet to be fully
realised. In this paper, we explore the specific application of clustering to
a multi-model climate ensemble. We hypothesise that clustering techniques can
provide (a) a flexible, data-driven method of testing model–observation
agreement and (b) a mechanism with which to identify model development
priorities. We focus our analysis on chemistry–climate model (CCM) output of
tropospheric ozone – an important greenhouse gas – from the recent
Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP).
Tropospheric column ozone from the ACCMIP ensemble was clustered using the
Data Density based Clustering (DDC) algorithm. We find that a multi-model
mean (MMM) calculated using members of the most-populous cluster identified
at each location offers a reduction of up to ∼ 20 % in the global
absolute mean bias between the MMM and an observed satellite-based
tropospheric ozone climatology, with respect to a simple, all-model MMM. On a
spatial basis, the bias is reduced at ∼ 62 % of all locations, with
the largest bias reductions occurring in the Northern Hemisphere – where
ozone concentrations are relatively large. However, the bias is unchanged at
9 % of all locations and increases at 29 %, particularly in the
Southern Hemisphere. The latter demonstrates that although cluster-based
subsampling acts to remove outlier model data, such data may in fact be
closer to observed values in some locations. We further demonstrate that
clustering can provide a viable and useful framework in which to assess and
visualise model spread, offering insight into geographical areas of agreement
among models and a measure of diversity across an ensemble. Finally, we
discuss caveats of the clustering techniques and note that while we have
focused on tropospheric ozone, the principles underlying the cluster-based
MMMs are applicable to other prognostic variables from climate models.