Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge 2016
DOI: 10.1145/2988257.2988268
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Emotion Recognition for AVEC 2016 Challenge

Abstract: This paper describes a systems for emotion recognition and its application on the dataset from the AV+EC 2016 Emotion Recognition Challenge. The realized system was produced and submitted to the AV+EC 2016 evaluation, making use of all three modalities (audio, video, and physiological data). Our work primarily focused on features derived from audio. The original audio features were complement with bottleneck features and also text-based emotion recognition which is based on transcribing audio by an automatic s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(40 citation statements)
references
References 16 publications
0
40
0
Order By: Relevance
“…Povolny et al used all features to train linear regressors to predict a value for each frame, and considered two methods for incorporating contextual information: simple frame stacking and temporal content summarization by applying statistics to local windows. In contrast, in this work we show that considering temporal dependencies that are longer than those presented in [6,7] is critical to improve continuous emotion recognition performance.…”
Section: Related Workmentioning
confidence: 58%
See 1 more Smart Citation
“…Povolny et al used all features to train linear regressors to predict a value for each frame, and considered two methods for incorporating contextual information: simple frame stacking and temporal content summarization by applying statistics to local windows. In contrast, in this work we show that considering temporal dependencies that are longer than those presented in [6,7] is critical to improve continuous emotion recognition performance.…”
Section: Related Workmentioning
confidence: 58%
“…The higher-level audio features were used to train linear SVRs. Povolny et al [7] used eGeMAPS [8] features along with a set of higher-level bottleneck features extracted from a DNN trained for automatic speech recognition (ASR) to train linear regressors. The higher level features were produced from an initial set of 24 Mel filterbank (MFB) features and four different estimates of the fundamental frequency (F0).…”
Section: Related Workmentioning
confidence: 99%
“…The most directly comparable system is a deep BLSTM-RNN regressor trained with MSE loss, as both MSE and CCE are frame-level objectives. Previous works have reported that using CCC as the objective function consistently outperforms MSE for regression models [16,17]. In addition to being more in line with the evaluation metric, CCC has the advantage over MSE and CCE in that it is an utterance-level objective that takes into account the overall shape of the time series.…”
Section: Deep Blstm-rnn Regressormentioning
confidence: 99%
“…The best performer of AVEC 2016, Brady et al, used SVR trained on sparse-coded higherlevel representations of various types of audio features [15]. Povolny et al trained a set of linear regressors on eGeMAPS augmented with deep bottleneck features from Deep Neural Network (DNN) acoustic models [16]. Trigeorgis et al trained a convolutional RNN directly on raw waveform [17].…”
Section: Avec Approachesmentioning
confidence: 99%
“…Some people cry out feeling sad, while others keep a neutral expression in an attempt to hide their true feelings [23], adding to the difficulty of recognising emotions. Research on previous AVEC challenges have explored various methods: extracting multimodal features [6,35], the fusion of multiple modalities [8,34] and numerous deep learning architectures [33,43] for recognising emotions.…”
Section: Introductionmentioning
confidence: 99%