Multimodal sentiment analysis aims to harvest people’s opinions or attitudes from multimedia data through fusion techniques. However, existing fusion methods cannot take advantage of the correlation between multimodal data but introduce interference factors. In this paper, we propose an Interactive Transformer and Soft Mapping based method for multimodal sentiment analysis. In the Interactive Transformer layer, an Interactive Multihead Guided-Attention structure composed of a pair of Multihead Attention modules is first utilized to find the mapping relationship between multimodalities. Then, the obtained results are fed into a Feedforward Neural Network. The Soft Mapping layer consisting of stacking Soft Attention module is finally used to map the results to a higher dimension to realize the fusion of multimodal information. The proposed model can fully consider the relationship between multiple modal pieces of information and provides a new solution to the problem of data interaction in multimodal sentiment analysis. Our model was evaluated on benchmark datasets CMU-MOSEI and MELD, and the accuracy is improved by 5.57% compared with the baseline standard.
With the rapid popularity and continuous development of social networks, users’ communication and interaction through platforms such as microblogs and forums have become more and more frequent. The comment data on these platforms reflect users’ opinions and sentiment tendencies, and sentiment analysis of comment data has become one of the hot spots and difficulties in current research. In this paper, we propose a BERT-ETextCNN-ELSTM (Bidirectional Encoder Representations from Transformers–Enhanced Convolution Neural Networks–Enhanced Long Short-Term Memory) model for sentiment analysis. The model takes text after word embedding and BERT encoder processing and feeds it to an optimized CNN layer for convolutional operations in order to extract local features of the text. The features from the CNN layer are then fed into the LSTM layer for time-series modeling to capture long-term dependencies in the text. The experimental results proved that compared with TextCNN (Convolution Neural Networks), LSTM (Long Short-Term Memory), TextCNN-LSTM (Convolution Neural Networks–Long Short-Term Memory), and BiLSTM-ATT (Bidirectional Long Short-Term Memory Network–Attention), the model proposed in this paper was more effective in sentiment analysis. In the experimental data, the model reached a maximum of 0.89, 0.88, and 0.86 in terms of accuracy, F1 value, and macro-average F1 value, respectively, on both datasets, proving that the model proposed in this paper was more effective in sentiment analysis of comment data. The proposed model achieved better performance in the review sentiment analysis task and significantly outperformed the other comparable models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.