Audio-Visual Affect Recognition in Activation-Evaluation Space

Zeng, Zhihong; Zhang, Z.; Pianfetti, B.; Tu, Jilin; Huang, Thomas S.

doi:10.1109/icme.2005.1521551

Cited by 13 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yu et al (2004) classified user engagement in social telephone conversations between friends along arousal and valence scales that were discretized into 5 levels. Kim et al (2005), Zeng et al (2005) andWo ¨llmer et al (2009) classified emotions in the 4 emotion quadrants of the arousal-valence space. Instead of classifying emotions on discretized scales of arousal and valence, some studies have taken up the challenge to classify emotions on continuous scales of arousal and valence.…”

Section: Related Workmentioning

confidence: 99%

Speech-based recognition of self-reported and observed emotion in a dimensional space

Truong

Leeuwen

Jong

2012

Speech Communication

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Speech-based recognition of self-reported and observed emotion in a dimensional space

Truong

Leeuwen

Jong

2012

Speech Communication

View full text Add to dashboard Cite

“…• Audio-Visual Classification may be useful in identifying affective cues such as vocal inflections or prosodic features [180], [188], [223].…”

Section: Affect Recognitionmentioning

confidence: 99%

“…Among the audiovisual affect recognition approaches, Busso et al [190], Petridis and Pantic [184] and Schuller et al [183] have employed feature-level fusion, which concatenates multimodal features and passes them through a single affect recognition model or classifier. Decision-level fusion, on the other hand, has been used by Hoch et al [255], Go et al [187], Pal et al [185], Wang and Guan [180], Zeng et al [178], Zeng et al [256], and Zeng et al [223]. However, decision-level fusion ignores the inherent correlation between these multi-modal features.…”

Section: G Audio-visual Analysis In Affect Recognitionmentioning

confidence: 99%

“…Wang & Guan [180] Gabor wavelets Fisher's-LDA Prosody, MFCC Zeng et al [179] Motion units, Prosody MFHMM Zeng et al [178] LLP, Prosody Adaboost + MHMM Zeng et al [256] Motion units, Prosody SNoW Zeng et al [221] Motion units, Prosody MFHMM Zeng et al [223] Motion units, Prosody HMM Wöllmeret al [192] MFCC, Facial flow LSTM Karpouzis et al [186] FPs, Prosody RNN Caridakis et al [189] Facial points, Prosody RNN Fragopanagos & FAPs, Prosody Neural network Taylor [188] (NN) Petridis & Facial points, MFCC Adaboost, NN Pantic [184] Ringeval et al [191] Low level descriptor -MFCC , LGBP from three orthogonal planes Schoneveld et al [219] Face Image, MFCC CNN->LSTM Hossain &…”

Section: G Audio-visual Analysis In Affect Recognitionmentioning

confidence: 99%

“…MFCC & Image CNN->ELM->SVM Muhammad [251] TABLE 9: Audio-visual analysis for affect recognition is categorized based on features and techniques. [190], [255] and different types of HMMs [178], [179], [181], [221], [223].…”

Section: G Audio-visual Analysis In Affect Recognitionmentioning

confidence: 99%

See 2 more Smart Citations

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

Shahabaz,

Sarkar

2024

IEEE Access

View full text Add to dashboard Cite

The joint analysis of audio and video is a powerful tool that can be applied to various contexts, including action, speech, and sound recognition, audio-visual video parsing, emotion recognition in affective computing, and self-supervised training of deep learning models. Solving these problems often involves tackling core audio-visual tasks, such as audio-visual source localization, audio-visual correspondence, and audio-visual source separation, which can be combined in various ways to achieve the desired results. This paper provides a review of the literature in this area, discussing the advancements, history, and datasets of audio-visual learning methods for various application domains. It also presents an overview of the reported performances on standard datasets and suggests promising directions for future research. INDEX TERMS computer vision, audio-video analysis, contrastive learning, multi-modal analysis

show abstract

Audio-Visual Spontaneous Emotion Recognition

Zeng

Roisman

et al.

Artifical Intelligence for Human Computing

View full text Add to dashboard Cite

Abstract. Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting-the Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression are at the same coarse affective states, positive and negative emotion sequences are labeled according to Facial Action Coding System. Facial texture in visual channel and prosody in audio channel are integrated in the framework of Adaboost multi-stream hidden Markov model (AdaMHMM) in which the Adaboost learning scheme is used to build component HMM fusion. Our approach is evaluated in AAI spontaneous emotion recognition experiments.

show abstract

Audio-Visual Affect Recognition in Activation-Evaluation Space

Cited by 13 publications

References 9 publications

Speech-based recognition of self-reported and observed emotion in a dimensional space

Speech-based recognition of self-reported and observed emotion in a dimensional space

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

Audio-Visual Spontaneous Emotion Recognition

Contact Info

Product

Resources

About