2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5494890
|View full text |Cite
|
Sign up to set email alerts
|

Decision level combination of multiple modalities for recognition and analysis of emotional expression

Abstract: Emotion is expressed and perceived through multiple modalities. In this work, we model face, voice and head movement cues for emotion recognition and we fuse classi ers using a Bayesian framework. The facial classi er is the best performing followed by the voice and head classi ers and the multiple modalities seem to carry complementary information, especially for happiness. Decision fusion signi cantly increases the average total unweighted accuracy, from 55% to about 62%. Overall, we achieve average accuracy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
51
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(57 citation statements)
references
References 12 publications
5
51
1
Order By: Relevance
“…The results show that EUs contain different levels of emotion salience, demonstrating that certain EUs are more strongly associated with particular emotion classes (e.g., a rise in anger strongly suggests the class of "angry"). The classification results demonstrate that this method can achieve comparable results to the state of the art [15]. These findings suggest that there may exist basic building blocks that underlie expressions of emotion.…”
Section: Introductionsupporting
confidence: 64%
See 1 more Smart Citation
“…The results show that EUs contain different levels of emotion salience, demonstrating that certain EUs are more strongly associated with particular emotion classes (e.g., a rise in anger strongly suggests the class of "angry"). The classification results demonstrate that this method can achieve comparable results to the state of the art [15]. These findings suggest that there may exist basic building blocks that underlie expressions of emotion.…”
Section: Introductionsupporting
confidence: 64%
“…The results demonstrate that variable-length units can effectively capture local dynamics and can be used in a classification framework to achieve results comparable to the state-of-the-art on the IEMOCAP database (62.42% [15]). The results also show that these units can be used to interpret the emotional dynamics of affective utterances.…”
Section: Discussionmentioning
confidence: 62%
“…We also compare our results with the maximal accuracy achieved from a previous work of Metallinou et al [37], which utilizes the same IEMOCAP database as our work and introduces a decisionlevel Bayesian fusion over models using face, voice, and head movement cues. Although Metallinou's work used a different subset of the IEMOCAP database, this comparison supports the strong performance of our proposed method.…”
Section: Baseline Modelsmentioning
confidence: 50%
“…This way of combining evidence lead to overall improvements in our early work. However much work on ensemble learning has demonstrated that for variety of tasks this method of combination is not as powerful as decision-level combination (for example see Raaijmakers, Truong, & Wilson, 2008;van Halteren, Zavrel, & Daelemans, 1998;Metallinou, Lee, & Narayanan, 2010;Bertolami & Bunke, 2006). We treat the feature-level combination as the baseline for our experiments.…”
Section: Feature-level Combination(c1)mentioning
confidence: 99%