2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS) 2020
DOI: 10.1109/ipas50080.2020.9334943
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual Classification of Group Emotion Valence Using Activity Recognition Networks

Abstract: Despite recent efforts, accuracy in group emotion recognition is still generally low. One of the reasons for these underwhelming performance levels is the scarcity of available labeled data which, like the literature approaches, is mainly focused on still images. In this work, we address this problem by adapting an inflated ResNet-50 pretrained for a similar task, activity recognition, where large labeled video datasets are available. Audio information is processed using a Bidirectional Long Short-Term Memory … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…Details Modality Accuracy, % VGG13 + VGG16 + ResNet [4] Ensemble video 59.42 Hu et al [25] Ensemble audio, video 59.01 Four face recognition CNNs+STAT+FFT [18] Ensemble video 56.7 Noisy student with iterative training [26] Single model video 55.17 Noisy student w/o iterative training [26] Single model video 52.49 DenseNet-161 [27] Single model video 51.44 Face attention network (FAN) [28] Single model video 51.18 LBP-TOP (baseline) [11] Single Method Details Modality Accuracy, % Hybrid Networks [5] Ensemble audio, video 74.28 K-injection network [29] Ensemble audio, video 66.19 DenseNet-121 (FER+) [5] Single model video 64.75 Activity Recognition Networks [30] sults, the accuracy of MobileNet features here is 2% higher when compared to EfficientNet, so that we could claim that both models have their advantages in various emotion recognition tasks.…”
Section: Methodsmentioning
confidence: 99%
“…Details Modality Accuracy, % VGG13 + VGG16 + ResNet [4] Ensemble video 59.42 Hu et al [25] Ensemble audio, video 59.01 Four face recognition CNNs+STAT+FFT [18] Ensemble video 56.7 Noisy student with iterative training [26] Single model video 55.17 Noisy student w/o iterative training [26] Single model video 52.49 DenseNet-161 [27] Single model video 51.44 Face attention network (FAN) [28] Single model video 51.18 LBP-TOP (baseline) [11] Single Method Details Modality Accuracy, % Hybrid Networks [5] Ensemble audio, video 74.28 K-injection network [29] Ensemble audio, video 66.19 DenseNet-121 (FER+) [5] Single model video 64.75 Activity Recognition Networks [30] sults, the accuracy of MobileNet features here is 2% higher when compared to EfficientNet, so that we could claim that both models have their advantages in various emotion recognition tasks.…”
Section: Methodsmentioning
confidence: 99%
“…Classifying human actions and activities is a challenging topic that benefited greatly from improvements in computational capabilities and neural networks. Action recognition and activity recognition are often used interchangeably [70]. This task is fundamental for scene understanding to capture object interactions.…”
Section: E Action Recognitionmentioning
confidence: 99%
“…Beyond detecting violence, the recognition of occupant's activity using different modalities has also been explored, specifically using audiovisual features [1]. Audio features appeared from applying this type of feature for the classification of group emotion [40]. This work focused primarily on the available hardware and energy consumption constraints associated with implementing an activity recognition system in a vehicle.…”
Section: In-vehicle Occupant Activity Recognitionmentioning
confidence: 99%
“…The baseline model used in this work is based on the model proposed in [40], presented in Figure 1. It uses RGB data for recognizing actions, and it has achieved results comparable to state-of-the-art methodologies.…”
Section: A Baselinementioning
confidence: 99%
See 1 more Smart Citation