Spoken Dialogue Systems Technology and Design 2010
DOI: 10.1007/978-1-4419-7934-6_4
|View full text |Cite
|
Sign up to set email alerts
|

Salient Features for Anger Recognition in German and English IVR Portals

Abstract: Anger recognition in speech dialogue systems can help to enhance human computer interaction. In this paper we report on the setup and performance optimization techniques for successful anger classification using acoustic cues. We evaluate the performance of a broad variety of features on both a German and an American English voice portal database which contain "real" speech, i.e. non-acted, continuous speech of narrow-band quality. Starting with a large-scale feature extraction, we determine optimal sets of fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
10
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 25 publications
1
10
0
Order By: Relevance
“…On the basis of a voiced, unvoiced and silence segmentation we calculate ratios of features from these segments both separately and jointly. A more detailed description of the feature setup can be found in Polzehl et al (2010).…”
Section: Feature Definitionmentioning
confidence: 99%
“…On the basis of a voiced, unvoiced and silence segmentation we calculate ratios of features from these segments both separately and jointly. A more detailed description of the feature setup can be found in Polzehl et al (2010).…”
Section: Feature Definitionmentioning
confidence: 99%
“…We extract audio descriptors using a 10ms frame shift, and derive statistics from these descriptors at the utterance level. Overall, we generate about 1450 features, which we have successfully used in our previous work on emotion recognition [6]. Features are extracted using Praat 1 .…”
Section: Signal-based Speech Analysismentioning
confidence: 99%
“…Best combinations of features were discovered afterwards, when more features were taken into account and subsets were compiled by diverse methods [6].…”
Section: Introductionmentioning
confidence: 99%
“…For our classifier, we will rely on prosodic and acoustic features, in line with findings of salient features reported in related work [6,7]. We leverage from previous work on emotion recognition [12], and extract audio descriptors such as 16 MFCC coefficients, 5 formant frequencies, intensity, pitch, perceptual loudness [13], zero-crossingrate, harmonics-to-noise-ratio, center of spectral mass gravity (centroid), the 95 % roll-off point of spectral energy and the spectral flux, etc, using a 10 ms frame shift. From these descriptors, we derive statistics at the utterance level, separate for voiced and unvoiced regions, on speech parts only.…”
Section: Signal-based Speech Analysismentioning
confidence: 99%
“…A more detailed description of the total of 1450 features is beyond the scope of this paper, and can be found in [12].…”
Section: Signal-based Speech Analysismentioning
confidence: 99%