Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literature. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work, we explore the reproducibility of several existing stance detection models, including both neural models and classical classifier-based models. Through experiments on two datasets-(i) the popular SemEval microblog dataset, and (ii) a set of health-related online news articles-we also perform a detailed comparative analysis of various methods and explore their shortcomings.
Microblogging sites like Twitter have become important sources of real-time information during disaster events. A large amount of valuable
situational information
is posted in these sites during disasters; however, the information is dispersed among hundreds of thousands of tweets containing sentiments and opinions of the masses. To effectively utilize microblogging sites during disaster events, it is necessary to not only
extract
the situational information from the large amounts of sentiments and opinions, but also to
summarize
the large amounts of situational information posted in real-time. During disasters in countries like India, a sizable number of tweets are posted in
local resource-poor languages
besides the normal English-language tweets. For instance, in the Indian subcontinent, a large number of tweets are posted in Hindi/Devanagari (the national language of India), and some of the information contained in such non-English tweets is not available (or available at a later point of time) through English tweets. In this work, we develop a novel classification-summarization framework which handles tweets in both English and Hindi—we first extract tweets containing situational information, and then summarize this information. Our proposed methodology is developed based on the understanding of how several concepts evolve in Twitter during disaster. This understanding helps us achieve superior performance compared to the state-of-the-art tweet classifiers and summarization approaches on English tweets. Additionally, to our knowledge, this is the first attempt to extract situational information from non-English tweets.
Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment (Dewaele, 2010). Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, specifically Hindi-English bilinguals, to understand the language of preference, if any, for expressing opinion and sentiment. To this end, we develop classifiers for opinion detection in these languages, and further classifying opinionated tweets into positive, negative and neutral sentiments. Our study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing. As an aside, we explore some common pragmatic functions of codeswitching through sentiment detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.