Recently, cluster analysis on f0 contours has become a popular method in phonetic research. Cluster analysis provides an automated way of categorising f0 contours, which gives new insights into (phonological) categories of intonation that vary across languages. As cluster analysis can be performed in many different ways, it is important to understand the extent to which these analyses can capture human perception of f0. This study focuses on the way in which f0 contours and differences among them are represented numerically, i.e., a crucial methodological choice preceding cluster analysis. These representations are then compared to the way in which f0 contour differences are perceived by human listeners from two different languages. To this end, four time-series contour representations (equivalent rectangular bandwidth, standardisation, octave-median rescaling, first derivative) and three distance measures [Euclidean distance (L2 norm), Pearson correlation, and dynamic time warping) were tested. The perceived differences were obtained from listeners of German and Papuan Malay, two typologically different languages. Results show that computed contour differences reflect human perception moderately, with dynamic time warping applied to the first derivative of the contour performing best, and showing minimal differences between the languages.