Face and Gesture 2011 2011
DOI: 10.1109/fg.2011.5771359
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study

Abstract: Abstract-A significant amount of the research on automatic emotion recognition from speech focuses on acted speech that is produced by professional actors. This approach often leads to overoptimistic results as the recognition of emotion in real-life conditions is more challenging due the propensity of mixed and less intense emotions in natural speech. The paper presents an empirical study of the most widely used classifiers in the domain of emotion recognition from speech, across multiple non-acted emotional … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 41 publications
0
7
0
Order By: Relevance
“…For each metric, annotators were required to label each output on a five-point Likert scale, where a score of zero signified poor quality, no usefulness, and a task that is not efficient, and a score of 4 signified high quality descriptions that contain clear and effective text that is easy to extract meaning from, that are useful in describing a topic, and that perform the annotation task in the best possible manner with the least amount of effort. The crowdsourcing of labelling natural language often uses a limited number of annotators with the expectation that they are perceived to be experts [50]. However, the task of annotating text is considered as being highly subjective and varies with the annotator's age, gender, experience, cultural location, and individual psychological differences [51].…”
Section: Evaluating Interpretability From a Human Perspectivementioning
confidence: 99%
“…For each metric, annotators were required to label each output on a five-point Likert scale, where a score of zero signified poor quality, no usefulness, and a task that is not efficient, and a score of 4 signified high quality descriptions that contain clear and effective text that is easy to extract meaning from, that are useful in describing a topic, and that perform the annotation task in the best possible manner with the least amount of effort. The crowdsourcing of labelling natural language often uses a limited number of annotators with the expectation that they are perceived to be experts [50]. However, the task of annotating text is considered as being highly subjective and varies with the annotator's age, gender, experience, cultural location, and individual psychological differences [51].…”
Section: Evaluating Interpretability From a Human Perspectivementioning
confidence: 99%
“…Studies on emotional expressions have in the past also relied on natural, real-life emotions, e.g., Tarasov and Delany (2011). Enacted emotions as in the EMODB are sometimes criticized as being exaggerated or prototypical and, thus, less ecologically valid.…”
Section: Study Design Stimulimentioning
confidence: 99%
“…For example, in human-machine interaction, better responses can be made if the emotional state of the human can be recognized. Existing work on this in the literature mainly focuses on developing models for assigning the labels like "pleasing", "angry" and "neural" to the data, e.g., [36], [37], [38], [39]. Most of the those efforts are supervised in natural, i.e., the ground truth labeling for the training data is required.…”
Section: Additional Validation Using Speech Datamentioning
confidence: 99%