Abstract. An initial dimension reduction forms an integral part of many analyses in
climate science. Different methods yield low-dimensional representations
that are based on differing aspects of the data. Depending on the features of
the data that are relevant for a given study, certain methods may be more
suitable than others, for instance yielding bases that can be more easily
identified with physically meaningful modes.
To illustrate the distinction between particular methods and identify
circumstances in which a given method might be preferred, in this paper we
present a set of case studies comparing the results obtained using the
traditional approaches of empirical orthogonal function analysis and k-means
clustering with the more recently introduced methods such as archetypal analysis
and convex coding.
For data such as global sea surface temperature anomalies, in which there is
a clear, dominant mode of variability, all of the methods considered
yield rather similar bases with which to represent the data while differing in reconstruction accuracy for a given basis size. However, in the absence of such
a clear scale separation, as in the case of daily geopotential height anomalies,
the extracted bases differ much more significantly between the methods.
We highlight the importance in such cases of carefully considering the
relevant features of interest and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.