Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.715
|View full text |Cite
|
Sign up to set email alerts
|

CancerEmo: A Dataset for Fine-Grained Emotion Detection

Abstract: Emotions are an important element of human nature, often affecting the overall wellbeing of a person. Therefore, it is no surprise that the health domain is a valuable area of interest for emotion detection, as it can provide medical staff or caregivers with essential information about patients. However, progress on this task has been hampered by the absence of large labeled datasets. To this end, we introduce CANCEREMO , an emotion dataset created from an online health community and annotated with eight fine-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(18 citation statements)
references
References 34 publications
0
16
0
2
Order By: Relevance
“…As reference, we evaluate also content-based attribute classification using the tweets posted by each user as relevant evidence. In this case, the tweets authored by each user, t u , are first converted into a 300-dimension text embedding vector using the pre-trained convolutional FastText neural model (Joulin et al 2017), 9 which is often used for tweet processing (Sosea and Caragea 2020). We average the Fast-Text tweet embeddings (Adi et al 2017), and feed the resulting user representation to the logistic regression network.…”
Section: Methodsmentioning
confidence: 99%
“…As reference, we evaluate also content-based attribute classification using the tweets posted by each user as relevant evidence. In this case, the tweets authored by each user, t u , are first converted into a 300-dimension text embedding vector using the pre-trained convolutional FastText neural model (Joulin et al 2017), 9 which is often used for tweet processing (Sosea and Caragea 2020). We average the Fast-Text tweet embeddings (Adi et al 2017), and feed the resulting user representation to the logistic regression network.…”
Section: Methodsmentioning
confidence: 99%
“…These votes are then replaced with new votes by asking the questions from other individuals. A similar method of labeling was used by (Sosea and Caragea, 2020) and was shown to obtain satisfactory results. However, it is important to keep in mind that the perception of emotions can be quite subjective.…”
Section: Labelingmentioning
confidence: 95%
“…(Gollapalli et al, 2020) introduces an unsu-pervised emotion detection method which is built upon word co-occurrences and word associations. Some emotion datasets include: (Sosea and Caragea, 2020) from an English online health community with a focus on cancer, (Demszky et al, 2020) from English Reddit comments, from long-form narratives in English, (Kumar et al, 2019) from Hindi stories, and (Almahdawi and Teahan, 2019) an Arabic dataset from Facebook posts written in the Iraqi dialect. While (Khosravi et al, 2019) uses machine learning models to detect emotions of Persian news texts, the dataset has not been published.…”
Section: Related Workmentioning
confidence: 99%
“…Stanford Sentiment Treebank (SST) (Socher et al, 2013) CancerEmo (Sosea and Caragea, 2020) is a sentence-level multilabel dataset of 8, 500 sentences labeled with the eight Plutchik (Plutchik, 1980) basic emotions from an Online Health Community for people suffering from diseases such as cancer.…”
Section: Datasets and Lexiconsmentioning
confidence: 99%
“…There are numerous studies that focus on emotion detection (Demszky et al, 2020;Desai et al, 2020;del Arco et al, 2020;Sosea and Caragea, 2020;Majumder et al, 2019;Mohammad and Kiritchenko, 2018;Abdul-Mageed and Ungar, 2017;Mohammad and Kiritchenko, 2015;Mohammad, 2012;Strapparava and Mihalcea, 2008) and sentiment analysis (Yin et al, 2020;Tian et al, 2020;Phan and Ogunbona, 2020;Zhai and Zhang, 2016;Chen et al, 2016;Liu, 2012;Glorot et al, 2011;Pang and Lee, 2005). Various lexicons have been used to improve model performance on these tasks.…”
Section: Introductionmentioning
confidence: 99%