Multimodal Emotion Recognition

Haq, Sanaul; Jackson, Philip J. B.

doi:10.4018/978-1-61520-919-4.ch017

Cited by 100 publications

(34 citation statements)

References 69 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second corpus is the SAVEE database, created by Haq and Jackson (2010). This corpus contains speech recordings from four male native English speakers.…”

Section: Experimental Methodologymentioning

confidence: 99%

Developing crossmodal expression recognition based on a deep neural model

Barros

Wermter

2016

Adaptive Behavior

View full text Add to dashboard Cite

A robot capable of understanding emotion expressions can increase its own capability of solving problems by using emotion expressions as part of its own decision-making, in a similar way to humans. Evidence shows that the perception of human interaction starts with an innate perception mechanism, where the interaction between different entities is perceived and categorized into two very clear directions: positive or negative. While the person is developing during childhood, the perception evolves and is shaped based on the observation of human interaction, creating the capability to learn different categories of expressions. In the context of human–robot interaction, we propose a model that simulates the innate perception of audio–visual emotion expressions with deep neural networks, that learns new expressions by categorizing them into emotional clusters with a self-organizing layer. The proposed model is evaluated with three different corpora: The Surrey Audio–Visual Expressed Emotion (SAVEE) database, the visual Bi-modal Face and Body benchmark (FABO) database, and the multimodal corpus of the Emotion Recognition in the Wild (EmotiW) challenge. We use these corpora to evaluate the performance of the model to recognize emotional expressions, and compare it to state-of-the-art research.

show abstract

“…The second corpus is the SAVEE database, created by Haq and Jackson (2010). This corpus contains speech recordings from four male native English speakers.…”

Section: Experimental Methodologymentioning

confidence: 99%

Developing crossmodal expression recognition based on a deep neural model

Barros

Wermter

2016

Adaptive Behavior

View full text Add to dashboard Cite

show abstract

“…We now describe how we employ the CycleGAN model to learn the expression mapping in the blendshape weights space. From the reconstructed blendshape weights of the used training video data set [HJ10], we first sample training expression pairs

(x_{i}, y_{i})

independently from a source domain and a target domain. Next, given samples in two expression domains X and Y (e.g.…”

Section: Cycle‐consistent Expression Mappingmentioning

confidence: 99%

“…We used the Surrey Audio‐Visual Expressed Emotion (SAVEE) data set [HJ10] for model training. The data set contains video clips recorded from four male actors with multiple expressions, uttering 120 sentences in English.…”

Section: Cycle‐consistent Expression Mappingmentioning

confidence: 99%

Real‐Time Facial Expression Transformation for Monocular RGB Video

Deng

2018

Computer Graphics Forum

View full text Add to dashboard Cite

This paper describes a novel real‐time end‐to‐end system for facial expression transformation, without the need of any driving source. Its core idea is to directly generate desired and photo‐realistic facial expressions on top of input monocular RGB video. Specifically, an unpaired learning framework is developed to learn the mapping between any two facial expressions in the facial blendshape space. Then, it automatically transforms the source expression in an input video clip to a specified target expression through the combination of automated 3D face construction, the learned bi‐directional expression mapping and automated lip correction. It can be applied to new users without additional training. Its effectiveness is demonstrated through many experiments on faces from live and online video, with different identities, ages, speeches and expressions.

show abstract

“…The Surrey Audio-Visual Expressed Emotion (SAVEE) database [12] consists of footage of 4 British male actors with six basic emotions(disgust, anger, happy, sad, fear surprise) and neutral state. A total of 480 phonetically balanced sentences are selected from the standard TIMIT corpus [13] for every emotional state.…”

Section: A Datasetsmentioning

confidence: 99%

Multimodal Speech Driven Facial Shape Animation Using Deep Neural Networks

Asadiabadi

Sadiq

Erzin

2018

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.

show abstract

Multimodal Emotion Recognition

Cited by 100 publications

References 69 publications

Developing crossmodal expression recognition based on a deep neural model

Developing crossmodal expression recognition based on a deep neural model

Real‐Time Facial Expression Transformation for Monocular RGB Video

Multimodal Speech Driven Facial Shape Animation Using Deep Neural Networks

Contact Info

Product

Resources

About