Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables is difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure called globally sparse probabilistic PCA (GSPPCA) that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify the original variables which are relevant to describe the data. To this end, using Roweis' probabilistic interpretation of PCA and a Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. To avoid the drawbacks of discrete model selection, a simple relaxation of this framework is presented. It allows to find a path of models using a variational expectation-maximization algorithm. The exact marginal likelihood is then maximized over this path. This approach is illustrated on real and synthetic data sets. In particular, using unlabeled microarray data, GSPPCA infers much more relevant gene subsets than traditional sparse PCA algorithms.
Context. The classification of the minor bodies of the Solar System based on observables has been continuously developed and iterated over the past 40 years. While prior iterations followed either the availability of large observational campaigns or new instrumental capabilities opening new observational dimensions, we see the opportunity to improve primarily upon the established methodology. Aims. We developed an iteration of the asteroid taxonomy which allows the classification of partial and complete observations (i.e. visible, near-infrared, and visible-near-infrared spectrometry) and which reintroduces the visual albedo into the classification observables. The resulting class assignments are given probabilistically, enabling the uncertainty of a classification to be quantified. Methods. We built the taxonomy based on 2983 observations of 2125 individual asteroids, representing an almost tenfold increase of sample size compared with the previous taxonomy. The asteroid classes are identified in a lower-dimensional representation of the observations using a mixture of common factor analysers model. Results. We identify 17 classes split into the three complexes C, M, and S, including the new Z-class for extremely-red objects in the main belt. The visual albedo information resolves the spectral degeneracy of the X-complex and establishes the P-class as part of the C-complex. We present a classification tool which computes probabilistic class assignments within this taxonomic scheme from asteroid observations, intrinsically accounting for degeneracies between classes based on the observed wavelength region. The taxonomic classifications of 6038 observations of 4526 individual asteroids are published. Conclusions. The ability to classify partial observations and the reintroduction of the visual albedo into the classification provide a taxonomy which is well suited for the current and future datasets of asteroid observations, in particular provided by the Gaia, MITHNEOS, NEO Surveyor, and SPHEREx surveys.
We present a novel family of deep neural architectures, named partially exchangeable networks (PENs) that leverage probabilistic symmetries. By design, PENs are invariant to block-switch transformations, which characterize the partial exchangeability properties of conditionally Markovian processes. Moreover, we show that any block-switch invariant function has a PEN-like representation. The DeepSets architecture is a special case of PEN and we can therefore also target fully exchangeable data. We employ PENs to learn summary statistics in approximate Bayesian computation (ABC). When comparing PENs to previous deep learning methods for learning summary statistics, our results are highly competitive, both considering time series and static models. Indeed, PENs provide more reliable posterior samples even when using less training data.
Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models. Variational methods are commonly used for inference; however, the exact likelihood of these models has been largely overlooked. The purpose of this work is to study the general properties of this quantity and to show how they can be leveraged in practice. We focus on important inferential problems that rely on the likelihood: estimation and missing data imputation. First, we investigate maximum likelihood estimation for DLVMs: in particular, we show that most unconstrained models used for continuous data have an unbounded likelihood function. This problematic behaviour is demonstrated to be a source of mode collapse. We also show how to ensure the existence of maximum likelihood estimates, and draw useful connections with nonparametric mixture models. Finally, we describe an algorithm for missing data imputation using the exact conditional likelihood of a deep latent variable model. On several data sets, our algorithm consistently and significantly outperforms the usual imputation scheme used for DLVMs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.