Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1435
|View full text |Cite
|
Sign up to set email alerts
|

Entity-aware Image Caption Generation

Abstract: Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks -long short term memory networks (CNN-LSTM) model to generate a template caption based on the input image. Then we use a knowledge … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
31
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 54 publications
(32 citation statements)
references
References 27 publications
1
31
0
Order By: Relevance
“…So leveraging textual information from the news at the article level or even the sentence level is not fine-grained enough for finding appropriate named entities. Secondly, to overcome the out-of-vocabulary (OOV) problem caused by named entities, previous works [3,12] follow a two-stage framework. They first generate the 'template caption' which contains the entity types (namely 'placeholders').…”
Section: Introductionmentioning
confidence: 99%
“…So leveraging textual information from the news at the article level or even the sentence level is not fine-grained enough for finding appropriate named entities. Secondly, to overcome the out-of-vocabulary (OOV) problem caused by named entities, previous works [3,12] follow a two-stage framework. They first generate the 'template caption' which contains the entity types (namely 'placeholders').…”
Section: Introductionmentioning
confidence: 99%
“…Thus, every representation from knowledge graphs must not be ambiguous regarding how they are presented, demanding consistency. This property is what allows knowledge graphs to be a reliable form of representation when representing large amounts of data-thus making them a viable option when trying to embed certain domains or documents in fields such as natural language processing [14], multimodal processing [15], etc. Inspired by this, the proposed model utilizes knowledge graph embeddings for the extraction of global relations from documents.…”
Section: Knowledge Graph Embeddingmentioning
confidence: 99%
“…Knowledge-driven Generation. Deep Neural Networks have been applied to generate natural language to describe structured knowledge bases (Duma and Klein, 2013;Konstas and Lapata, 2013;Flanigan et al, 2016;Hardy and Vlachos, 2018;Pourdamghani et al, 2016;Trisedya et al, 2018;Xu et al, 2018;Madotto et al, 2018;Nie et al, 2018), biographies based on attributes (Lebret et al, 2016;Chisholm et al, 2017;Kaffee et al, 2018;Wang et al, 2018a;Wiseman et al, 2018), and image/video captions based on background entities and events (Krishnamoorthy et al, 2013;Lu et al, 2018). To handle unknown words, we design an architecture similar to pointer-generator networks (See et al, 2017) and copy mechanism (Gu et al, 2016).…”
Section: Related Workmentioning
confidence: 99%