For more than a century, the methods for data representation and the exploration of the intrinsic structures of data have developed remarkably and consist of supervised and unsupervised methods. However, recent years have witnessed the flourishing of big data, where typical dataset dimensions are high and the data can come in messy, incomplete, unlabeled, or corrupted forms. Consequently, discovering the hidden structure buried inside such data becomes highly challenging. From this perspective, exploratory data analysis plays a substantial role in learning the hidden structures that encompass the significant features of the data in an ordered manner by extracting patterns and testing hypotheses to identify anomalies. Unsupervised generative learning models are a class of machine learning models characterized by their potential to reduce the dimensionality, discover the exploratory factors, and learn representations without any predefined labels; moreover, such models can generate the data from the reduced factors’ domain. The beginner researchers can find in this survey the recent unsupervised generative learning models for the purpose of data exploration and learning representations; specifically, this article covers three families of methods based on their usage in the era of big data: blind source separation, manifold learning, and neural networks, from shallow to deep architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.