2022 26th International Conference on Pattern Recognition (ICPR) 2022
DOI: 10.1109/icpr56361.2022.9956592
|View full text |Cite
|
Sign up to set email alerts
|

Self-attention fusion for audiovisual emotion recognition with incomplete data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…The Softmax activation promotes competition in the attention matrix [43], thus highlighting more important attributes and timestamps of each modality. As a result, it provides the importance score of each key relative to each query, that is, the importance of each representation of modality α with respect to modality β. Consequently, the features that exhibit agreement between the two modalities exert the greatest influence on the final prediction, thereby guiding the model to learn features that demonstrate a substantial level of agreement across modalities.…”
Section: Non-invasive Modal Fusion Transformer Encodermentioning
confidence: 99%
“…The Softmax activation promotes competition in the attention matrix [43], thus highlighting more important attributes and timestamps of each modality. As a result, it provides the importance score of each key relative to each query, that is, the importance of each representation of modality α with respect to modality β. Consequently, the features that exhibit agreement between the two modalities exert the greatest influence on the final prediction, thereby guiding the model to learn features that demonstrate a substantial level of agreement across modalities.…”
Section: Non-invasive Modal Fusion Transformer Encodermentioning
confidence: 99%
“…Furthermore, in addition to these brain-inspired methods, we compare the performance of M ulT (Chumachenko, Iosifidis, and Gabbouj 2022) with our model specifically in the RAVDESS dataset. The accuracy achieved by MulT on seven classes is 74.16%, and our model outperforms MulT by 25.47%.…”
Section: Overall Performancementioning
confidence: 99%
“…Recently, several multimodal deep-learning models have been designed to be trained concurrently with data of multiple modalities, such as vision, auditory, and sensor data. In particular, many studies [7][8][9][10][11] have proposed training speech-recognition models using various forms of data such as audio and text. Trained multimodal deep-learning models can train from diverse information and achieve a high prediction accuracy.…”
Section: Multimodal Deep Learningmentioning
confidence: 99%