Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018
DOI: 10.18653/v1/p18-1185
|View full text |Cite
|
Sign up to set email alerts
|

Visual Attention Model for Name Tagging in Multimodal Social Media

Abstract: Everyday billions of multimodal posts containing both images and text are shared in social media sites such as Snapchat, Twitter or Instagram. This combination of image and text in a single message allows for more creative and expressive forms of communication, and has become increasingly common in such sites. This new paradigm brings new challenges for natural language understanding, as the textual component tends to be shorter, more informal, and often is only understood if combined with the visual context. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
124
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 187 publications
(147 citation statements)
references
References 29 publications
0
124
0
Order By: Relevance
“…Potential improvements include, for example, accounting for the original multi-label nature of emotion classification, or covering more than only 20 emoji in emoji prediction. There are also other scenarios to be addressed as well, like sequence tagging (Baldwin et al, 2015;Gimpel et al, 2018), multimodality (Schifanella et al, 2016;Lu et al, 2018), and codeswitching tasks (Barman et al, 2014;Vilares et al, 2016). This is similar to the evolution of GLUE (Wang et al, 2019b) into SuperGLUE (Wang et al, 2019a), with both benchmarks contributing to the development of the field in different ways.…”
Section: Discussionmentioning
confidence: 99%
“…Potential improvements include, for example, accounting for the original multi-label nature of emotion classification, or covering more than only 20 emoji in emoji prediction. There are also other scenarios to be addressed as well, like sequence tagging (Baldwin et al, 2015;Gimpel et al, 2018), multimodality (Schifanella et al, 2016;Lu et al, 2018), and codeswitching tasks (Barman et al, 2014;Vilares et al, 2016). This is similar to the evolution of GLUE (Wang et al, 2019b) into SuperGLUE (Wang et al, 2019a), with both benchmarks contributing to the development of the field in different ways.…”
Section: Discussionmentioning
confidence: 99%
“…Attention mechanism was initially proposed in neural machine translation to dynamically adjust the focus on the source sentence (Bahdanau et al, 2014), but its application has been extended to many areas including multimodal fusion (Lu et al, 2018;Ghosal et al, 2018;. The idea of attention is to use the information of a vector (called query) to weighted-sum a list of vectors (called context).…”
Section: Attention Mechanismmentioning
confidence: 99%
“…Zhong et al (2016) also studied the combination of image and captions for the task of detecting cyberbullying. For the task of name tagging, formulated as a sequence labeling problem, Lu et al (2018) apply a visual attention model to put the focus on the sub-areas of a photo that are more relevant to the text encoded by a bi-LSTM model. For the task of image-text matching, Wang et al (2017) compare an embedding network that projects texts and photos into a joint space where semantically-similar texts and photos are close to each other, with a similarity network that fuses text embeddings and photo embeddings via element multiplication.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, authors of [41] place an attention layer on top of several modality-specific feature encoding layers to model the importance of different modalities in book genre prediction. There are many other works [20,35,39,40] that leverage this technique, i.e. encoding sequential/temporal data for each modality before computing attention weighting and fusing encoded modality-specific features.…”
Section: Related Workmentioning
confidence: 99%