Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.306
|View full text |Cite
|
Sign up to set email alerts
|

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer

Abstract: In this paper, we study Multimodal Named Entity Recognition (MNER) for social media posts. Existing approaches for MNER mainly suffer from two drawbacks: (1) despite generating word-aware visual representations, their word representations are insensitive to the visual context; (2) most of them ignore the bias brought by the visual context. To tackle the first issue, we propose a multimodal interaction module to obtain both image-aware word representations and word-aware visual representations. To alleviate the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
110
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 139 publications
(111 citation statements)
references
References 27 publications
0
110
0
1
Order By: Relevance
“…Application in Other Tasks. Besides multimodal sentiment analysis, multimodal learning has been applied in many other language tasks, such as Machine Translation (MT) [12,17,30,40], Named Entity Recognition (NER) [19,21,41,47], and parsing [28,29,48].…”
Section: Multimodal Language Learningmentioning
confidence: 99%
“…Application in Other Tasks. Besides multimodal sentiment analysis, multimodal learning has been applied in many other language tasks, such as Machine Translation (MT) [12,17,30,40], Named Entity Recognition (NER) [19,21,41,47], and parsing [28,29,48].…”
Section: Multimodal Language Learningmentioning
confidence: 99%
“…But it is challenging to bridge the gap between text and image. Several related studies with focus on named entity recognition propose to leverage the whole image information by ResNet encoding to augment each word representation, such as (Moon et al, 2018; upon RNN, (Yu et al, 2020b) upon Transformer and on GNN. Besides, several related studies propose to leveraging the fine-grained visual information by object detection, such as (Wu et al, 2020a,b) However, all the above studies completely ignore the sentiment polarity analysis dependent on the detected target, which has great facilitates in practical applications, such as e-commerce.…”
Section: Related Workmentioning
confidence: 99%
“…1) RAN (Wu et al, 2020a); a co-attention approach for aspect terms extraction in a multi-modal scenario. 2) UMT (Yu et al, 2020b); 3) OSCGA (Wu et al, 2020b), an NER approach in a multi-modal scenario based on object features with BIO tagging. Note that UMT and OSCGA focus on named entity recognition (NER) with BIO tagging in a multimodal scenario, leveraging the representation ability of transformer and object-level fine-grained visual features, respectively.…”
Section: Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, we use an interaction mechanism to reinforce them before fusing these information in the NER Module. Instead of directly concatenating these information with hidden representations in the NER module, we follow the previous studies (Zhang et al, 2018;Yu et al, 2020a) to use a gate function to dynamically control the amount of information flowing by infusing the expedient part while excluding the irrelevant part. The gate function uses the information from the NER Module to guide the process, which is described formally as follows:…”
Section: Ner Modulementioning
confidence: 99%