This paper aims to systematically examine the literature of machine learning for the period of 1968~2017 to identify and analyze the research trends. A list of journals from well-established publishers ScienceDirect, Springer, JMLR, IEEE (approximately 23,365 journal articles) related to machine learning is used to prepare a content collection. To the best of our information, it is the first effort to comprehend the trend analysis in machine learning research with topic models: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and LDA with Coherent Model (LDA_CM). The LDA_CM topic model gives the highest topic coherence amongst all topic models under consideration. This study provides a scientific ground that helps to overcome the subjectivity of collective opinion. The Mann-Kendall test is used to understand the trend of the topics. Our findings provide indicative of paradigmatic shifts in research methodology of significant patterns of topical prominence and the evolving research areas. It is used to highlight the evolution regarding the previous and recent trends in research topics in the area of machine learning. Understanding such an intellectual structure and future trends will assist the researchers to adopt the divergent developments of this research in one place. This paper analyzes the overall trends of the machine learning research since 1968, based on the latent topics identified in the period of 2007~2017 that may be helpful to the researchers exploring the recommended areas and publish their research articles.
Abstract-Topic modeling techniques have been primarily being used to mine the topics from text corpora. These techniques reveal the hidden thematic structure in a collection of documents and facilitate to build up new ways to browse, search and summarize large archive of texts. A topic is a group of words that frequently occur together. A topic modeling can connect words with similar meanings and make a distinction between uses of words with several meanings. Here we present a survey on journey of topic modeling techniques comprising Latent Dirichlet Allocation (LDA) and non-LDA based techniques and the reason for classify the techniques into LDA and non-LDA is that LDA has ruled the topic modeling techniques since its inception. We have used the three hierarchical classification criteria's for classifying topic models that include LDA and non-LDA based, bag-of-words or sequence-of-words approach and unsupervised or supervised learning for our survey. Purpose of this survey is to explore the topic modeling techniques since Singular Value Decomposition (SVD) topic model to the latest topic models in deep learning. Also, provide the brief summary of current probabilistic topic models as well as a motivation for future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.