Discovering inherent correlations and hot research topics among various disciplines from massive scientific documents is very important to understand the scientific research tendency. The LDA (Latent Dirichlet Allocation) topic model can find topics from big data sets, but the number of topics must to be told before topic clustering. There is a lot of randomness to determine the number of topics for the unknown structure of data sets. Therefore, this paper introduces the Hierarchical Dirichlet Process (HDP) to achieve topic clustering with discipline division. Those clustering topics are composed by a discrete set of words, and these words do not have semantic relation. For this problem, this paper proposes a method to find out relationships between topic words so as to extract discipline hotspots. This method contains classifying topics with the co-occurrence of subject words, constructing co-word network and analyzing discipline hotspots with weak co-occurrence theory. The experiment results indicate that the Hierarchical Dirichlet Process can mine topic word-sets, and effectiveness better than the LDA topic model. The co-word network based on the weak tie theory can effectively find the discipline hotspots, which explicitly reflects the research hotspots and inherent connections of disciplines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.