The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs.Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.
Submitted:July 26, 2015 1 st Revision:September 1, 2015 Accepted:September 4, 2015 * 국민대학교 비즈니스IT전문대학원 ** 국민대학교 경영정보학부 부교수, 교신저자 *** 실전전략연구소 Recently, many users frequently share their opinions on diverse issues using various social media. Therefore, many governments have attempted to establish or improve national policies according to the public opinions captured from the various social media. In this paper, we indicate several limitations of traditional approaches for analyzing public opinions about science and technology and provide an alternative methodology to overcome the limitations. First of all, we distinguish science and technology analysis phase and social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we apply a start list and a stop list successively to acquire clarified and interesting results. Finally, to identify most appropriate documents fitting to a given subject, we develop a new concept of logical filter that consists of not only mere keywords but also a logical relationship among keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discovering core issues and public opinions from 1,700,886 documents comprising SNS, blog, news, and discussion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.