2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6638345
|View full text |Cite
|
Sign up to set email alerts
|

Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow

Abstract: Emotion recognition is the process of identifying the affective characteristics of an utterance given either static or dynamic descriptions of its signal content. This requires the use of units, windows over which the emotion variation is quantified. However, the appropriate time scale for these units is still an open question. Traditionally, emotion recognition systems have relied upon units of fixed length, whose variation is then modeled over time. This paper takes the view that emotion is expressed over un… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(17 citation statements)
references
References 17 publications
1
16
0
Order By: Relevance
“…The unweighted recall associated with the PS/Phon and PS/TRA schemes are not statistically significantly different (p<0.05). The importance of variable-length segmentation was also demonstrated in previous work [22]. However, this is the first demonstration of the efficacy of unsupervised variable-length segmentation for raw facial movements.…”
Section: Variable-length Segmentationmentioning
confidence: 53%
“…The unweighted recall associated with the PS/Phon and PS/TRA schemes are not statistically significantly different (p<0.05). The importance of variable-length segmentation was also demonstrated in previous work [22]. However, this is the first demonstration of the efficacy of unsupervised variable-length segmentation for raw facial movements.…”
Section: Variable-length Segmentationmentioning
confidence: 53%
“…The selection of suitable time for the audio segment is a challenging problem in this era. Many researchers have worked, how to select a suitable time for each speech segment which has found some reasonable solution, that a segment of a speech signal is longer than 260ms that have more information to recognize the emotions in his/her speech [33], [34]. In this paper, we have done different observations on multiple frame durations to optimally select 500ms window size to convert single utterance into several segments.…”
Section: A Pre-processing and Sequence Selectionmentioning
confidence: 99%
“…[11] Hence, the lack of IS purity in natural expressions of IS should be considered when designing systems to recognize IS of non-stereotypical speech. In emotion profile theory [17], [18], [19], [20], [21], [22], [23], emotion profile recently has been demonstrated that it can interpret the emotion content of ambiguous utterances (i.e., containing shades of multiple affective classes). Inspired by the emotion profiling technique, the IS profile representation is proposed in this study to better interpret the ISs of ambiguous utterances and increase the discriminative abilities for IS recognition in non-stereotypical speech.…”
Section: Is Profile Representation For Is Recognitionmentioning
confidence: 99%