Unified models of sentiment and topic have been widely employed in unsupervised sentiment analysis, where each word in text carries both sentiment and topic information. In fact, however, some words tend to express objective things while others prefer to express subjective sentiments.Based on this observation, the concept of word bias is put forward firstly, including objective bias and subjective bias. Considering the relations of bias, sentiment, and topic, a unified framework named Bias-Sentiment-Topic (BST) model is then presented to jointly model them for microblog sentiment analysis. After that, an improved Gibbs sampler is proposed for the inference of BST by introducing the general Pólya urn model, which incorporates word embedding as the general knowledge. Finally, experiments on standard test datasets illustrate major improvements of BST in sentiment classification and its effectiveness in separation of words with different biases.
KEYWORDSbias, Pólya urn model, sentiment analysis, topic model, word embedding
INTRODUCTIONIn recent years, with the rapid development of various social media platforms, more and more users are willing to use social media services like microblogs to express their sentiments and opinions. This undoubtedly results in the generation of huge amounts of data online each day. Using sentiment analysis methods to mine the customers' satisfaction with brands, companies can adjust the market strategy in time to improve their market competitiveness. 1 Analysis of the public's sentiments about stocks and politicians in tweets can be applied to the prediction of stock markets and election results, etc. 2,3 Sentiment analysis of social networks can also help to better understand user behaviors and discover the sentiment-related patterns. 4-6 Compared with traditional long texts, microblog texts are short, informal, and semantic-rich but feature-sparse. So, the sentiment analysis of microblog data has become an important research area. The unified models of sentiment and topic have been widely employed in the sentiment analysis, such as the Joint Sentiment-Topic (JST) model, 7 Aspect and Sentiment Unification Model (ASUM), 8 and Joint Aspect-based Sentiment Topic (JAST) model. 9 JST achieves the conjoint analysis of sentiment and topic by building an additional sentiment layer based on latent Dirichlet allocation (LDA). 10 ASUM extends JST, figuring that each sentence in review texts belongs to a sentiment and an aspect. JAST is able not only to mine sentiment and topic but also to separate and extract the aspect and opinion information. There are several problems in these models. (1) They only consider that sentiments and topics depend on each other, but actually, sentiments are not only just related to topics but also related to the subjectivity and objectivity of words. (2) They cannot take full advantage of emoticons, which actually illustrate the most typical emotional features of microblogs. (3) They just focus on the analysis of test datasets, not incorporating the general knowledge ...