Encoding Emotions in Speech with the Size Code

Chuenwattanapranithi, Suthathip; Xu, Yi; Thipakorn, Bundit; Maneewongvatana, Songrit

doi:10.1159/000192793

Cited by 49 publications

(57 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…While there are some data on decoding emotions based on body language and vocal signals in humans (Bänziger, Grandjean, & Scherer, 2009;Chuenwattanapranithi, 2008;de Gelder, 2009), so far the vast majority of studies on emotion recognition have focused on facial expressions (see e.g. Breazeal, 2003;Ekman, 1993).…”

Section: Introductionmentioning

confidence: 99%

Humans attribute emotions to a robot that shows simple behavioural patterns borrowed from dog behaviour

Gácsi

Kis

Faragó

et al. 2016

Computers in Human Behavior

View full text Add to dashboard Cite

In social robotics it has been a crucial issue to determine the minimal set of relevant behaviour actions that humans interpret as social competencies. As a potential alternative of mimicking human abilities, it has been proposed to use a non-human animal, the dog as a natural model for developing simple, non-linguistic emotional expressions for non-humanoid social robots. In the present study human participants were presented with short video sequences in which a PeopleBot robot and a dog displayed behaviours that corresponded to five emotional states (joy, fear, anger, sadness, and neutral) in a neutral environment. The actions of the robot were developed on the basis of dog expressive behaviours that had been described in previous studies of dog-human interactions. In their answers to open-ended questions, participants spontaneously attributed emotional states to both the robot and the dog.They could also successfully match all dog videos and all robot videos with the correct emotional state. We conclude that our bottom up approach (starting from a simpler animal signalling system, rather than decomposing complex human signalling systems) can be used as a promising model for developing believable and easily recognisable emotional displays for non-humanoid social robots. Highlights:Humans spontaneously attribute emotions to an ethologically inspired robot 2 Dog emotional videos prime the attribution of emotions to robot videos Participants were able to match both dog and robot videos to the corresponding emotions Experience with dogs does not help identify dog and robot emotions

show abstract

Section: Introductionmentioning

confidence: 99%

Humans attribute emotions to a robot that shows simple behavioural patterns borrowed from dog behaviour

Gácsi

Kis

Faragó

et al. 2016

Computers in Human Behavior

View full text Add to dashboard Cite

show abstract

“…As can be observed from Fig. 15, although in the third row the two phonemes still have overlaps, some of them are attenuated, such as the upper part (21)(22)(23)(24). Only two phonemes from two types of speech are analyzed here.…”

Section: Comparison Of Subbands Between Different Featuresmentioning

confidence: 87%

“…The differences between neutral speech and each type of emotion affected speech differ in different frequency area, especially in the low and high frequencies. According to the size code hypothesis [22], when speech is produced with the happy emo- tion, the vocal tract of the speaker tends to be shortened, and the F 0 tend to be raised. The F-Ratio curves could reflect this affection slightly.…”

Section: The F-ratio Analysis For the Emotion Affected Speechmentioning

confidence: 99%

“…While the Mel FBE spectrograms may be able to space frequency more compactly for each speech unit, shown in the second row, in which the main part of spectrogram for each phoneme covers no more than half of the whole bands. However, for the Mel FBE spectrograms, there is much overlapped area between different phonemes, such as the upper part (15)(16)(17)(18)(19)(20)(21)(22)(23)(24), and it may be difficult to distinguish them on the frame level for the emotion affected speech. The F-Ratio based FBE spectrograms are proposed to solve this problem by emphasizing the resolution of frequencies with more discrepancies between different speech units, while attenuating the resolution of frequencies with more overlap between different speech units.…”

Section: Comparison Of Subbands Between Different Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

Sun

Zhou

Zhao

et al. 2010

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

show abstract

“…Potential support for this explanation can be derived from findings that prosodic cues indicating anger and happiness in human speech may also be related to the acoustic size code discussed earlier in this chapter. Human listeners perceive synthetic vowels created with a dynamically lower F0 and smaller formant dispersion as being spoken in an angry voice, whilst vowels with a dynamically higher F0 and a larger formant dispersion are perceived as being spoken in a happy voice (Chuenwattanapranithi et al, 2008). This potential universality in ritualisation across mammal vocalisations may allow dogs to generalise their responses to specific prosodic cues, aiding their perception of certain emotions in the human voice.…”

Section: Emotional Informationmentioning

confidence: 99%