Many decision problems are set in changing environments. For example, determining the optimal investment in cyber maintenance depends on whether there is evidence of an unusual vulnerability, such as “Heartbleed,” that is causing an especially high rate of incidents. This gives rise to the need for timely information to update decision models so that optimal policies can be generated for each decision period. Social media provide a streaming source of relevant information, but that information needs to be efficiently transformed into numbers to enable the needed updates. This article explores the use of social media as an observation source for timely decision making. To efficiently generate the observations for Bayesian updates, we propose a novel computational method to fit an existing clustering model. The proposed method is called k-means latent Dirichlet allocation (KLDA). We illustrate the method using a cybersecurity problem. Many organizations ignore “medium” vulnerabilities identified during periodic scans. Decision makers must choose whether staff should be required to address these vulnerabilities during periods of elevated risk. Also, we study four text corpora with 100 replications and show that KLDA is associated with significantly reduced computational times and more consistent model accuracy.
Freestyle text data such as surveys, complaint transcripts, customer ratings, or maintenance squawks can provide critical information for quality engineering. Exploratory text data analysis (ETDA) is proposed here as a special case of exploratory data analysis (EDA) for quality improvement problems with freestyle text data. The EDTA method seeks to extract useful information from the text data to identify hypotheses for additional exploration relating to key inputs or outputs. The proposed four steps of ETDA are: (1) preprocessing of text data, (2) text data analysis and display, (3) salient feature identification, and (4) salient feature interpretation. Five examples illustrate the methods.
Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.
Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.