2022
DOI: 10.1109/lsp.2022.3210836
|View full text |Cite
|
Sign up to set email alerts
|

Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…They used deep-learning models such as InceptionV3, VGG16, and VGG19 to achieve a maximum accuracy of 93.33% [ 17 ]. In [ 18 ], the authors presented a contextual cross-modal transformer module for the fusion of textual and audio modalities operated on IEMOCAP and MELD datasets to achieve a maximum accuracy of 84.27%. In [ 19 ], the authors illustrated a speech recognition technique on frequency domain features of an Arabic dataset using SVM, KNN, and MLP techniques to achieve a maximum recognition accuracy of 77.14%.…”
Section: Related Workmentioning
confidence: 99%
“…They used deep-learning models such as InceptionV3, VGG16, and VGG19 to achieve a maximum accuracy of 93.33% [ 17 ]. In [ 18 ], the authors presented a contextual cross-modal transformer module for the fusion of textual and audio modalities operated on IEMOCAP and MELD datasets to achieve a maximum accuracy of 84.27%. In [ 19 ], the authors illustrated a speech recognition technique on frequency domain features of an Arabic dataset using SVM, KNN, and MLP techniques to achieve a maximum recognition accuracy of 77.14%.…”
Section: Related Workmentioning
confidence: 99%
“…Yang et al incorporate context data into the current speech by embedding prior statements between interlocutors, which improves the emotional depiction of the present utterance. The suggested cross-modal converter module then focuses on the interconnections between text and auditory modalities, adaptively fostering modality fusion (Yang et al, 2022 ). Based on the proposed papers listed above, it is clear that multimodality currently plays a significant role in HRI research.…”
Section: Recent Advancements Of Application For Multi-modal Human–rob...mentioning
confidence: 99%
“…Previous studies have shown that more effective and valuable joint multimodal representations can be obtained by combining complementary features in different modalities (Shraga et al 2020;Springstein, Müller-Budack, and Ewerth 2021), benefiting from the evolution of learning-based techniques (Yang et al 2023c;Chen et al 2024;Li, Yang, and Zhang 2023;Yang et al 2023d). Most MSA works (Hazarika, Zimmermann, and Poria 2020;Yu et al 2021;Yang et al 2022aYang et al ,d, 2023bYang et al , 2022bLi, Wang, and Cui 2023) are based on the assumptions that all modalities are available during the training and testing phases. In real applications, the assumption will not hold due to many inevitable factors, such as privacy, device, or security constraints, resulting in significant degradation of model performance.…”
Section: Introductionmentioning
confidence: 99%