Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.575
|View full text |Cite
|
Sign up to set email alerts
|

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Abstract: We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of humanannotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 28 publications
(29 reference statements)
1
10
0
Order By: Relevance
“…The fairly low agreement among the annotators indicates that the task of detecting Ekman's emotions in tweets is challenging and/or subjective. This is in line with previous studies which reported that emotion detection from text is a complex task that results in low inter-annotator agreements regardless of the emotion taxonomy used (Alm et al, 2005;Schuff et al, 2017;Kim and Klinger, 2018;Bostan and Klinger, 2018;Öhman et al, 2020;Acheampong et al, 2020).…”
Section: Annotation Analysissupporting
confidence: 91%
See 2 more Smart Citations
“…The fairly low agreement among the annotators indicates that the task of detecting Ekman's emotions in tweets is challenging and/or subjective. This is in line with previous studies which reported that emotion detection from text is a complex task that results in low inter-annotator agreements regardless of the emotion taxonomy used (Alm et al, 2005;Schuff et al, 2017;Kim and Klinger, 2018;Bostan and Klinger, 2018;Öhman et al, 2020;Acheampong et al, 2020).…”
Section: Annotation Analysissupporting
confidence: 91%
“…Several recent surveys (Acheampong et al, 2020;Alswaidan and Menai, 2020) and studies ( Öhman et al, 2020;Bostan et al, 2020;Bostan and Klinger, 2018;Schuff et al, 2017) list previous work on emotion detection from texts and emphasise their differences in type of emotion taxonomy, task (single-label or multi-label), size of the dataset, text genre, granularity, topics, system architectures, and best results obtained with systems for automatic detection of emotions in texts. However, none of the studies focussed on assessing the quality of benchmark datasets, or the influence of methods used for obtaining gold labels on the results of systems for automatic emotion detection from texts.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, we adopt the continual pre-training method for the masked language model (MLM) (Devlin et al, 2018b) in this Track 2 to directly improve final downstream performance. The available datasets are chosen from the open-source resources (Demszky et al, 2020;Öhman et al, 2020). The optimization function is written as follows…”
Section: Continuing Pre-trainingmentioning
confidence: 99%
“…There is a Python library called FinMeter 27 (Hämäläinen and Alnajjar, 2019b) that has some higher level semantic tools for Finnish such as metaphor interpretation, word concreteness analysis and sentiment analysis. Sentiment analysis for Finnish has also been studied later on 28 ( Öhman et al, 2020;Vankka et al, 2019;Lindén et al, 2020). There is also research on topic modeling methods (Ginter et al, 2009;Hengchen et al, 2018;Loukasmäki and Makkonen, 2019).…”
Section: Semanticsmentioning
confidence: 99%