2015 International Conference on Affective Computing and Intelligent Interaction (ACII) 2015
DOI: 10.1109/acii.2015.7344645
|View full text |Cite
|
Sign up to set email alerts
|

Emotion recognition in spontaneous and acted dialogues

Abstract: In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurren… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
36
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 41 publications
(39 citation statements)
references
References 35 publications
3
36
0
Order By: Relevance
“…The overall classification framework has been shown in Figure 1. Previous studies have concluded that the performance of the LSTM model can be enhanced by using more predictive and knowledge-inspired features despite the limited training examples [18,22,23]. Therefore, LSTM is a natural choice for us to use with features generated by VAEs.…”
Section: Speech Emotion Classification Using Lstmmentioning
confidence: 99%
See 2 more Smart Citations
“…The overall classification framework has been shown in Figure 1. Previous studies have concluded that the performance of the LSTM model can be enhanced by using more predictive and knowledge-inspired features despite the limited training examples [18,22,23]. Therefore, LSTM is a natural choice for us to use with features generated by VAEs.…”
Section: Speech Emotion Classification Using Lstmmentioning
confidence: 99%
“…IEMOCAP data were also annotated on three continuous dimensions: Arousal (A), Power (P), and Valence (V). For comparison of our classification results with the state-of-the-art approaches in [18,22], we also consider the above emotion dimensions. However, to maintain it as a classification problem, like [18,22], within each dimension we created three categories: low (values less than 3), mid (values equal to 3) and high (values greater than 3).…”
Section: Speech Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…https://www.ibm.com/watson/developercloud/speech-to-text.html (verbal filled pauses), stutters, laughter, and audible breath (remaining words are labelled as general lexicons). DIS-NVs were shown to be indicators of speaker emotions in spontaneous dialogue [10]. To evaluate annotation agreement, we divide the annotations into six subsets based on the DIS-NV labels and compute CC of the word timings in each subset.…”
Section: Transcription and Affective Cue Annotationmentioning
confidence: 99%
“…Moreover, cues of perceived emotions in movies may be used for the recognition of induced emotions as well. Thus, we add manual transcripts of the LIRIS-ACCEDE movies, as well as expert annotations of DISfluency and Non-verbal Vocalisations (DIS-NV) in dialogues [10] and aesthetic highlights [11]. In addition, as a comparison with movie based features, we extract physiological and behavioural features based on signals collected from wearable sensors attached to the audience [12].…”
Section: Introductionmentioning
confidence: 99%