2021
DOI: 10.1016/j.patrec.2021.03.007
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging recent advances in deep learning for audio-Visual emotion recognition

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
58
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 162 publications
(61 citation statements)
references
References 15 publications
1
58
0
2
Order By: Relevance
“…Recognition of human emotion recognition using audio and visual features is also studied in some previously proposed work. Recently, Schoneveld et al [11] used deep feature representations of the audio and visual modalities to improve the accuracy of the FER task. In addition, Zhou et al [12] explored audio features using speech-spectrogram and Log Mel-spectrogram and evaluated facial features with different CNNs and different emotion pretrained strategies.…”
Section: Related Workmentioning
confidence: 99%
“…Recognition of human emotion recognition using audio and visual features is also studied in some previously proposed work. Recently, Schoneveld et al [11] used deep feature representations of the audio and visual modalities to improve the accuracy of the FER task. In addition, Zhou et al [12] explored audio features using speech-spectrogram and Log Mel-spectrogram and evaluated facial features with different CNNs and different emotion pretrained strategies.…”
Section: Related Workmentioning
confidence: 99%
“…Nguyen et al [31] proposed a deep model of two-stream auto-encoders and LSTM to simultaneously learn compact representative features from A and V modalities for dimensional ER. Schonevald et al [40] explored knowledge distillation using teacher-student model for V modality and CNN model for A modality using spectrograms, and fused using RNNs. Deng et al [2] proposed iterative self distillation method for modeling the uncertainties in the labels in a multi-task framework.…”
Section: Related Work 21 A-v Fusion Based Emotion Recognitionmentioning
confidence: 99%
“…Several ER approaches have been proposed for videobased dimensional ER using convolutional neural networks (CNNs) to obtain the deep features, and recurrent neural networks (RNNs) to capture the temporal dynamics [40,43]. Deep models have also been widely explored for vocal emotion recognition, typically using spectrograms with 2D-CNNs [40,44], or raw wave forms with 1D-CNNs [43].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…At the same time, deep learning has shown outstanding contributions in many fields such as speech recognition and machine translation. Some scholars have achieved excellent results in the music emotion classification competition task through the indepth application of deep learning technology (Music Information Retrieval Evaluation Exchange), combined with the emotion recognition method of music audio [7][8][9] in Figure 1.…”
Section: Introductionmentioning
confidence: 99%