Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in:Neurocomputing 163 (2015)
AbstractThe growing interest in big data problems implies the need for unsupervised methods for data visualization and dimensionality reduction. Diffusion Maps (DM) is a recent technique that can capture the lower dimensional geometric structure underlying the sample patterns in a way which can be made to be independent of the sampling distribution. Moreover, DM allows to define an embedding whose Euclidean metric relates to the sample's intrinsic one which, in turn, enables a principled application of k-means clustering. In this work we give a self-contained review of DM and discuss two methods to compute the DM embedding coordinates to new out-of-sample data. Then, we will apply them on two meteorological data problems that involve respectively time and spatial compression of numerical weather forecasts and show how DM is capable to, first, greatly reduce the initial dimension while still capturing relevant information in the original data and, also, how the sample-derived DM embedding coordinates can be extended to new patterns.