Abstract. In this paper, we present an experiment to identify emotions in tweets. Unlike previous studies, which typically use the six basic emotion classes defined by Ekman, we classify emotions according to a set of eight basic bipolar emotions defined by Plutchik (Plutchik's "wheel of emotions"). This allows us to treat the inherently multi-class problem of emotion classification as a binary problem for four opposing emotion pairs. Our approach applies distant supervision, which has been shown to be an effective way to overcome the need for a large set of manually labeled data to produce accurate classifiers. We build on previous work by treating not only emoticons and hashtags but also emoji, which are increasingly used in social media, as an alternative for explicit, manual labels. Since these labels may be noisy, we first perform an experiment to investigate the correspondence among particular labels of different types assumed to be indicative of the same emotion. We then test and compare the accuracy of independent binary classifiers for each of Plutchik's four binary emotion pairs trained with different combinations of label types. Our best performing classifiers produce results between 75-91%, depending on the emotion pair; these classifiers can be combined to emulate a single multi-label classifier for Plutchik's eight emotions that achieves accuracies superior to those reported in previous multi-way classification studies.
This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones.
In this paper, we describe a means for automatically building very large neural networks (VLNNs) from definition texts in machine-readable dictionaries, and demonslrate the use of these networks for word sense disambiguation. Our method brings together two earlier, independent approaches to word sense disambiguation: the use of machine-readable dictionaries and spreading and activation models. The automatic construction of VLNNs enables real-size experiments with neural networks for natural language processing, which in turn provides insight into their behavior and design and can lead to possible improvements.
This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators. Because our approach is fully automated through all its steps, it could provide means to obtain large samples of "sense-tagged" data without the high cost of human annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.