In this paper, we present a probabilistic generative approach for constructing topographic maps of tree-structured data. Our model defines a low-dimensional manifold of local noise models, namely, (hidden) Markov tree models, induced by a smooth mapping from low-dimensional latent space. We contrast our approach with that of topographic map formation using recursive neural-based techniques, namely, the self-organizing map for structured data (SOMSD) (Hagenbuchner , 2003). The probabilistic nature of our model brings a number of benefits: 1) naturally defined cost function that drives the model optimization; 2) principled model comparison and testing for overfitting; 3) a potential for transparent interpretation of the map by inspecting the underlying local noise models; 4) natural accommodation of alternative local noise models implicitly expressing different notions of structured data similarity. Furthermore, in contrast with the recursive neural-based approaches, the smooth nature of the mapping from the latent space to the local model space allows for calculation of magnification factors--a useful tool for the detection of data clusters. We demonstrate our approach on three data sets: a toy data set, an artificially generated data set, and on a data set of images represented as quadtrees.
In the era of rapidly increasing amounts of time series data, classification of variable objects has become the main objective of time-domain astronomy. Classification of irregularly sampled time series is particularly difficult because the data cannot be represented naturally as a vector which can be directly fed into a classifier. In the literature, various statistical features serve as vector representations.In this work, we represent time series by a density model. The density model captures all the information available, including measurement errors. Hence, we view this model as a generalisation to the static features which directly can be derived, e.g., as moments from the density. Similarity between each pair of time series is quantified by the distance between their respective models. Classification is performed on the obtained distance matrix.In the numerical experiments, we use data from the OGLE and ASAS surveys and demonstrate that the proposed representation performs up to par with the best currently used feature-based approaches. The density representation preserves all static information present in the observational data, in contrast to a less complete description by features. The density representation is an upper boundary in terms of information made available to the classifier. Consequently, the predictive power of the proposed classification depends on the choice of similarity measure and classifier, only. Due to its principled nature, we advocate that this new approach of representing time series has potential in tasks beyond classification, e.g., unsupervised learning.
We present an approach for the visualisation of a set of time series that combines an echo state network with an autoencoder. For each time series in the dataset we train an echo state network, using a common and fixed reservoir of hidden neurons, and use the optimised readout weights as the new representation. Dimensionality reduction is then performed via an autoencoder on the readout weight representations. The crux of the work is to equip the autoencoder with a loss function that correctly interprets the reconstructed readout weights by associating them with a reconstruction error measured in the data space of sequences. This essentially amounts to measuring the predictive performance that the reconstructed readout weights exhibit on their corresponding sequences when plugged back into the echo state network with the same fixed reservoir. We demonstrate that the proposed visualisation framework can deal both with real valued sequences as well as binary sequences. We derive magnification factors in order to analyse distance preservations and distortions in the visualisation space. The versatility and advantages of the proposed method are demonstrated on datasets of time series that originate from diverse domains.
We present a probabilistic generative approach for constructing topographic maps of light curves from eclipsing binary stars. The model defines a low-dimensional manifold of local noise models induced by a smooth non-linear mapping from a low-dimensional latent space into the space of probabilistic models of the observed light curves. The local noise models are physical models that describe how such light curves are generated. Due to the principled probabilistic nature of the model, a cost function arises naturally and the model parameters are fitted via MAP estimation using the Expectation-Maximisation algorithm. Once the model has been trained, each light curve may be projected to the latent space as the the mean posterior probability over the local noise models. We demonstrate our approach on a dataset of artificially generated light curves and on a dataset comprised of light curves from real observations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.