Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

Kumar, Ayush; Vepa, Jithendra

doi:10.1109/icassp40776.2020.9053012

Cited by 69 publications

(28 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compare the evaluation results of the model on CMU-MOSEI dataset with Graph-MFN [14], B2 + B4 w/multimodal fusion [16], Multilogue-Net [18], and TBJE [19]. e results of 2-classsentiment are shown in Table 1.…”

Section: E Results Of Cmu-mosei Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

Guo

Feng

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Multimodal sentiment analysis aims to harvest people’s opinions or attitudes from multimedia data through fusion techniques. However, existing fusion methods cannot take advantage of the correlation between multimodal data but introduce interference factors. In this paper, we propose an Interactive Transformer and Soft Mapping based method for multimodal sentiment analysis. In the Interactive Transformer layer, an Interactive Multihead Guided-Attention structure composed of a pair of Multihead Attention modules is first utilized to find the mapping relationship between multimodalities. Then, the obtained results are fed into a Feedforward Neural Network. The Soft Mapping layer consisting of stacking Soft Attention module is finally used to map the results to a higher dimension to realize the fusion of multimodal information. The proposed model can fully consider the relationship between multiple modal pieces of information and provides a new solution to the problem of data interaction in multimodal sentiment analysis. Our model was evaluated on benchmark datasets CMU-MOSEI and MELD, and the accuracy is improved by 5.57% compared with the baseline standard.

show abstract

Section: E Results Of Cmu-mosei Datasetmentioning

confidence: 99%

“…It utilizes self-attention to capture long term context and gating mechanism to selectively learn cross attended features [16].…”

Section: B2 + B4 W/multimodal Fusionmentioning

confidence: 99%

Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

Guo

Feng

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…A gated mechanism could be considered as a special variant of attention mechanism, which also be employed for the cross-modal fusion. Kumar et al [ 46 ] proposed a conditional gated mechanism to modulate the information during mining inter-modal interaction.…”

Section: Related Workmentioning

confidence: 99%

Cross-Modal Sentiment Sensing with Visual-Augmented Representation and Diverse Decision Fusion

Zhang

Yin

2021

Sensors

View full text Add to dashboard Cite

The rising use of online media has changed the social customs of the public. Users have become accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying emotion and attitude has provided significant decision support for numerous tasks in sentiment analysis. Conventional methods for sentiment classification only concern textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship among modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this paper to capture both inter-modal and intra-modal features. Firstly, in the stage of representation fusion, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual contents, with the guidance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the stage of decision fusion. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empiric results on five multimodal datasets have demonstrated that the proposed model achieves higher accuracy and better generalization capacity for multimodal sentiment analysis.

show abstract

“…A spoken interaction additionally requires conversational and channel understandability as highlighted in this work. While works have been carried out in understanding disfluency (Wang et al, 2020a;Lin and Wang, 2020) and turn-taking (Aldeneh et al, 2018;Hara et al, 2018), the authors narrowly aim at improving the task specific results by modelling acoustic cues (Aldeneh et al, 2018;Kumar and Vepa, 2020) or training with auxiliary tasks (Aldeneh et al, 2018;Hara et al, 2018;Wang et al, 2020a;Sundararaman et al, 2021). The effort in our work is orthogonal to what has been carried out in the past research.…”

Section: Related Workmentioning

confidence: 99%

What BERT Based Language Model Learns in Spoken Transcripts: An Empirical Study

Kumar¹,

Sundararaman²,

Vepa³

2021

Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

Language Models (LMs) have been ubiquitously leveraged in various tasks including spoken language understanding (SLU). Spoken language requires careful understanding of speaker interactions, dialog states and speech induced multimodal behaviors to generate a meaningful representation of the conversation. In this work, we propose to dissect SLU into three representative properties: conversational (disfluency, pause, overtalk), channel (speakertype, turn-tasks) and ASR (insertion, deletion, substitution). We probe BERT based language models (BERT, RoBERTa) trained on spoken transcripts to investigate its ability to understand multifarious properties in absence of any speech cues. Empirical results indicate that LM is surprisingly good at capturing conversational properties such as pause prediction and overtalk detection from lexical tokens. On the downsides, the LM scores low on turntasks and ASR errors predictions. Additionally, pre-training the LM on spoken transcripts restrain its linguistic understanding. Finally, we establish the efficacy and transferability of the mentioned properties on two benchmark datasets: Switchboard Dialog Act and Disfluency datasets.

show abstract

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

Cited by 69 publications

References 16 publications

Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

Cross-Modal Sentiment Sensing with Visual-Augmented Representation and Diverse Decision Fusion

What BERT Based Language Model Learns in Spoken Transcripts: An Empirical Study

Contact Info

Product

Resources

About