Clustering is a core problem within a wide range of research disciplines ranging from machine learning and data mining to classical statistics. A group of clustering approaches so-called nonparametric methods, aims to cluster a set of entities into a beforehand unspecied and unknown number of clusters, making potentially expensive pre-analysis of data obsolete. In this paper, the recently, by Côté and Larochelle introduced in nite Restricted Boltzmann Machine that has the ability to self-regulate its number of hidden parameters is adapted to the problem of clustering by the introduction of two basic cluster membership assumptions. A descriptive study of the in uence of several regularization and sparsity settings on the clustering behavior is presented and results are discussed. The results show that sparsity is a key adaption when using the iRBM for clustering that improves both the clustering performances as well as the number of identi ed clusters. CCS CONCEPTS • Computing methodologies → Unsupervised learning; Cluster analysis;
In this paper, we describe the infinite replicated Softmax model (iRSM) as an adaptive topic model, utilizing the combination of the infinite restricted Boltzmann machine (iRBM) and the replicated Softmax model (RSM). In our approach, the iRBM extends the RBM by enabling its hidden layer to adapt to the data at hand, while the RSM allows for modeling low-dimensional latent semantic representation from a corpus. The combination of the two results is a method that is able to self-adapt to the number of topics within the document corpus and hence, renders manual identification of the correct number of topics superfluous. We propose a hybrid training approach to effectively improve the performance of the iRSM. An empirical evaluation is performed on a standard data set and the results are compared to the results of a baseline topic model. The results show that the iRSM adapts its hidden layer size to the data and when trained in the proposed hybrid manner outperforms the base RSM model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.