Anger recognition in speech dialogue systems can help to enhance human computer interaction. In this paper we report on the setup and performance optimization techniques for successful anger classification using acoustic cues. We evaluate the performance of a broad variety of features on both a German and an American English voice portal database which contain "real" speech, i.e. non-acted, continuous speech of narrow-band quality. Starting with a large-scale feature extraction, we determine optimal sets of feature combinations for each language, by applying an Information-Gain based ranking scheme. Analyzing the ranking we notice that a large proportion of the most promising features for both databases are derived from MFCC and loudness. In contrast to this similarity also pitch features proved importance for the English database. We further calculate classification scores for our setups using discriminative training and Support-Vector Machine classification. The developed systems show that anger recognition in both English and
Salient Features for Anger Recognition in German and English IVR PortalsGerman language can be processed very similarly reaching comparable results.