The large source of information space produced by the plethora of social media platforms in general and microblogging in particular has spawned a slew of new applications and prompted the rise and expansion of sentiment analysis research. We propose a sentiment analysis technique that identifies the main parts to describe tweet intent and also enriches them with relevant words, phrases, or even inferred variables. We followed a state-of-the-art hybrid deep learning model to combine Convolutional Neural Network (CNN) and the Long Short-Term Memory network (LSTM) to classify tweet data based on their polarity. To preserve the latent relationships between tweet terms and their expanded representation, sentence encoding and contextualized word embeddings are utilized. To investigate the performance of tweet embeddings on the sentiment analysis task, we tested several context-free models (Word2Vec, Sentence2Vec, Glove, and FastText), a dynamic embedding model (BERT), deep contextualized word representations (ELMo), and an entity-based model (Wikipedia). The proposed method and results prove that text enrichment improves the accuracy of sentiment polarity classification with a notable percentage.
Users of social media may use words and phrases literally to convey their views or opinion clearly. However, some people choose to utilise idioms or proverbs that are implicit and indirect in order to make a stronger impression on the audience or perhaps to catch their attention by utilising a funny, sarcastic, or metaphorical phrases. Idioms and proverbs are examples of figurative expressions with a thematically coherent totality that cannot be understood literally. In a previous work, the extension of IBM's Sentiment Lexicon of Idiomatic Expressions was proposed to include around 9,000 idioms; both lexicons are manually annotated by crowdsourcing service. Therefore, in this research, we provide knowledge-based expansion approach to avoid human annotation of idioms. For sentiment classification, the proposed method has the advantage that it does not require any fine-tuning for the BERT model. Experimental comparisons show that the automated idiom enrichment and annotation are very beneficial for the performance of the sentiment classifier. The expanded annotated lexicon will be made available to the general public.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.