Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer

Yu, Jianfei; Jiang, Jing; Yang, Li; Xia, Rui

doi:10.18653/v1/2020.acl-main.306

Cited by 139 publications

(111 citation statements)

References 27 publications

Supporting

Mentioning

110

Contrasting

Unclassified

Order By: Relevance

“…Application in Other Tasks. Besides multimodal sentiment analysis, multimodal learning has been applied in many other language tasks, such as Machine Translation (MT) [12,17,30,40], Named Entity Recognition (NER) [19,21,41,47], and parsing [28,29,48].…”

Section: Multimodal Language Learningmentioning

confidence: 99%

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Han

Chen

Gelbukh

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

124

View full text Add to dashboard Cite

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, previous work is restricted by the lack of leveraging dynamics of independence and correlation between modalities to reach top performance. To mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel endto-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations. The two parts are trained simultaneously such that the combat between them is simulated. The model takes two bimodal pairs as input due to the known information imbalance among modalities. In addition, we leverage a gated control mechanism in the Transformer architecture to further improve the final output. Experimental results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our model significantly outperforms the SOTA.

show abstract

Section: Multimodal Language Learningmentioning

confidence: 99%

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Han

Chen

Gelbukh

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

124

View full text Add to dashboard Cite

show abstract

“…But it is challenging to bridge the gap between text and image. Several related studies with focus on named entity recognition propose to leverage the whole image information by ResNet encoding to augment each word representation, such as (Moon et al, 2018; upon RNN, (Yu et al, 2020b) upon Transformer and on GNN. Besides, several related studies propose to leveraging the fine-grained visual information by object detection, such as (Wu et al, 2020a,b) However, all the above studies completely ignore the sentiment polarity analysis dependent on the detected target, which has great facilitates in practical applications, such as e-commerce.…”

Section: Related Workmentioning

confidence: 99%

“…1) RAN (Wu et al, 2020a); a co-attention approach for aspect terms extraction in a multi-modal scenario. 2) UMT (Yu et al, 2020b); 3) OSCGA (Wu et al, 2020b), an NER approach in a multi-modal scenario based on object features with BIO tagging. Note that UMT and OSCGA focus on named entity recognition (NER) with BIO tagging in a multimodal scenario, leveraging the representation ability of transformer and object-level fine-grained visual features, respectively.…”

Section: Baselinesmentioning

confidence: 99%

“…Note that Rp-BERT is a multi-modal multi-task approach for NER and text-image relation. Although it also (Yu et al, 2020b) 61.0 60.4 61.6 60.8 60.0 61.7 OSCGA-collapse (Wu et al, 2020b) 63.2 63.1 63.7 63.5 63.5 63.5 RpBERT (Sun et al, 2021b) 48 Approaches F 1 score (Lu et al, 2018) 81.0 RpBERT (Sun et al, 2021b) 88.1 JML 89.8 leverages cross-modal relation, but it relies on collapsed tagging, which can not attend to different features for different multi-modal sub-tasks.…”

Section: Baselinesmentioning

confidence: 99%

See 1 more Smart Citation

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

Ju¹,

Zhang²,

Xiao³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Aspect terms extraction (ATE) and aspect sentiment classification (ASC) are two fundamental and fine-grained sub-tasks in aspect-level sentiment analysis (ALSA). In the textual analysis, jointly extracting both aspect terms and sentiment polarities has been drawn much attention due to the better applications than individual sub-task. However, in the multimodal scenario, the existing studies are limited to handle each sub-task independently, which fails to model the innate connection between the above two objectives and ignores the better applications. Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA). Specifically, we first build an auxiliary text-image relation detection module to control the proper exploitation of visual information. Second, we adopt the hierarchical framework to bridge the multi-modal connection between MATE and MASC, as well as separately visual guiding for each sub module. Finally, we can obtain all aspect-level sentiment polarities dependent on the jointly extracted specific aspects. Extensive experiments show the effectiveness of our approach against the joint textual approaches, pipeline and collapsed multi-modal approaches.

show abstract

“…Thus, we use an interaction mechanism to reinforce them before fusing these information in the NER Module. Instead of directly concatenating these information with hidden representations in the NER module, we follow the previous studies (Zhang et al, 2018;Yu et al, 2020a) to use a gate function to dynamically control the amount of information flowing by infusing the expedient part while excluding the irrelevant part. The gate function uses the information from the NER Module to guide the process, which is described formally as follows:…”

Section: Ner Modulementioning

confidence: 99%

Modularized Interaction Network for Named Entity Recognition

Li¹,

Wang²,

Hui³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Although the existing Named Entity Recognition (NER) models have achieved promising performance, they suffer from certain drawbacks. The sequence labeling-based NER models do not perform well in recognizing long entities as they focus only on word-level information, while the segment-based NER models which focus on processing segment instead of single word are unable to capture the word-level dependencies within the segment. Moreover, as boundary detection and type prediction may cooperate with each other for the NER task, it is also important for the two subtasks to mutually reinforce each other by sharing their information. In this paper, we propose a novel Modularized Interaction Network (MIN) model which utilizes both segmentlevel information and word-level dependencies, and incorporates an interaction mechanism to support information sharing between boundary detection and type prediction to enhance the performance for the NER task. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed MIN model has outperformed the current stateof-the-art models.

show abstract

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer

Cited by 139 publications

References 27 publications

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection

Modularized Interaction Network for Named Entity Recognition

Contact Info

Product

Resources

About