Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion

Murray, Iain R.; Arnott, John L.

doi:10.1121/1.405558

Cited by 874 publications

(494 citation statements)

References 0 publications

Supporting

Mentioning

456

Contrasting

Unclassified

Order By: Relevance

“…That has been called the palette theory (Scherer, 1984b). The term primary is still widely used, in lay parlance and in the speech literature (e.g., Murray and Arnott, 1993); but in fact, ÔpaletteÕ theories have very little support in modern emotion research (Ekman, 1999).…”

Section: Lists Of Key Emotion Categoriesmentioning

confidence: 99%

Describing the emotional states that are expressed in speech

Cowie

Cornelius

2003

Speech Communication

505

273

View full text Add to dashboard Cite

To study relations between speech and emotion, it is necessary to have methods of describing emotion. Finding appropriate methods is not straightforward, and there are difficulties associated with the most familiar. The word emotion itself is problematic: a narrow sense is often seen as ''correct'', but it excludes what may be key areas in relation to speech--including states where emotion is present but not full-blown, and related states (e.g., arousal, attitude). Everyday emotion words form a rich descriptive system, but it is intractable because it involves so many categories, and the relationships among them are undefined. Several alternative types of description are available. Emotion-related biological changes are well documented, although reductionist conceptions of them are problematic. Psychology offers descriptive systems based on dimensions such as evaluation (positive or negative) and level of activation, or on logical elements that can be used to define an appraisal of the situation. Adequate descriptive systems need to recognise the importance of both time course and interactions involving multiple emotions and/or deliberate control. From these conceptions of emotion come various tools and techniques for describing particular episodes. Different tools and techniques are appropriate for different purposes.

show abstract

Section: Lists Of Key Emotion Categoriesmentioning

confidence: 99%

Describing the emotional states that are expressed in speech

Cowie

Cornelius

2003

Speech Communication

505

273

View full text Add to dashboard Cite

show abstract

“…Williams and Stevens (1972) concluded that the pitch contour is the best indicator ofthe emotional content ofan utterance. In their review ofthe literature, Murray and Arnott (1993) noted that the most commonly referenced vocal parameters are pitch (i.e., both the average value and range of the fundamental frequency), duration, intensity, and the undefined term voice quality.…”

mentioning

confidence: 99%

Perceiving affect from the voice and the face

Massaro

Egan

1996

Psychonomic Bulletin & Review

243

163

View full text Add to dashboard Cite

This experiment examines how emotion is perceived by using facial and vocal cues of a speaker. Three levels of facial affect were presented using a computer-generated face. Three levels of vocal affect were obtained by recording the voice of a male amateur actor who spoke a semantically neutral word in different simulated emotional states. These two independent variables were presented to subjects in all possible permutations--visual cues alone, vocal cues alone, and visual and vocal cues together-which gave a total set of 15 stimuli. The subjects were asked to judge the emotion of the stimuli in a two-alternative forced choice task (either HAPPy or ANGRY). The results indicate that subjects evaluate and integrate information from both modalities to perceive emotion. The influence of one modality was greater to the extent that the other was ambiguous (neutral). The fuzzy logical model of perception (FLMP)fit the judgments significantly better than an additive model, which weakens theories based on an additive combination of modalities, categorical perception, and influence from only a single modality.Research has shown that we use multiple sources of information when we comprehend speech (Massaro, 1987b(Massaro, , 1989Massaro & Cohen, 1990). Visual information from a speaker's face, for example, can strongly influence speech perception, especially when the auditory information is degraded: in one study, recognition of auditory sentences in noisy environments improved from 23% to 65% when the perceivers could also see the speaker's face (Summerfield, 1979). We also use multiple sources of information when we perceive a speaker's emotion. These sources include a variety ofparalinguistic signals, as well as the speech's verbal content. The emotion may be interpreted in different ways, depending on the voice quality, facial expression, and body language ofthe speaker. To study the degree to which paralinguistic sources of information are used, it is important that one first define these sources and then determine how they are evaluated and integrated. In the present study, in order to investigate the perception of a speaker's emotion, two sources of paralinguistic information were varied: facial expressions and vocal cues.Facial expressions are an effective means of communicating emotion. Darwin (1872) argued that facial expressions originate in basic acts of self-preservation common to human beings and other animals, and that these expressions are related to the emotional states that they convey. Research by Meltzoff and Moore (1977) suggests that we are biologically prepared from birth to respond to facial expressions. They produced evidence which showed that

show abstract

“…ÔAnxiousÕ utterances show segments that are shorter than average, with exception of voiceless plosives. Also in (Murray and Arnott, 1993), relations were shown between the emotion state and the duration of vowels and consonants. But in nearly all studies pitch and energy are the most commonly applied features to distinguish and classify emotion state (Murray and Arnott, 1993), or anyway to convey supra-textual information.…”

Section: Emotion and Asr Affective Computingmentioning

confidence: 99%

“…Also in (Murray and Arnott, 1993), relations were shown between the emotion state and the duration of vowels and consonants. But in nearly all studies pitch and energy are the most commonly applied features to distinguish and classify emotion state (Murray and Arnott, 1993), or anyway to convey supra-textual information. In Slaney and McRoberts, 1998, a study was conducted to automatically classify an utterance (spoken by a parent to a young infant) into three classes: approval, attention and prohibition.…”

Section: Emotion and Asr Affective Computingmentioning

confidence: 99%

See 1 more Smart Citation

Emotions, speech and the ASR framework

Bosch

2003

Speech Communication

View full text Add to dashboard Cite

Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of particular importance in this communication process. This paper discusses the possibilities to recognize emotion from the speech signal, primarily from the viewpoint of automatic speech recognition (ASR). The general focus is on the extraction of acoustic features from the speech signal that can be used for the detection of the emotional state or stress state of the speaker.After the introduction, a short overview of the ASR framework is presented. Next, we discuss the relation between recognition of emotion and ASR, and the different approaches found in the literature that deal with the correspondence between emotions and acoustic features. The conclusion is that automatic emotional tagging of the speech signal is difficult to perform with high accuracy, but prosodic information is nevertheless potentially useful to improve the dialogue handling in ASR tasks on a limited domain.

show abstract

Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion

Cited by 874 publications

References 0 publications

Describing the emotional states that are expressed in speech

Describing the emotional states that are expressed in speech

Perceiving affect from the voice and the face

Emotions, speech and the ASR framework

Contact Info

Product

Resources

About