1st International Workshop on Multimodal Understanding and Learning for Embodied Applications 2019
DOI: 10.1145/3347450.3357656
|View full text |Cite
|
Sign up to set email alerts
|

Geometry-aware Relational Exemplar Attention for Dense Captioning

Abstract: Dense captioning (DC), which provides a comprehensive context understanding of images by describing all salient visual groundings in an image, facilitates multimodal understanding and learning. As an extension of image captioning, DC is developed to discover richer sets of visual contents and to generate captions of wider diversity and increased details. The state-of-the-art models of DC consist of three stages: (1) region proposals, (2) region classification, and (3) caption generation for each proposal. They… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…At present, with the development of computer vision and natural language processing technology, as well as the continually improving computer technology, it is possible to obtain useful information about multiple objects from images automatically. In particular, dense captioning, a technology based on computer vision, is gaining traction in this field [3][4][5][6][7][8]. Dense captioning is a subset of image captioning technology [9] that understands the characteristics of objects, their activities, and relationships and expresses them in natural language.…”
Section: Introductionmentioning
confidence: 99%
“…At present, with the development of computer vision and natural language processing technology, as well as the continually improving computer technology, it is possible to obtain useful information about multiple objects from images automatically. In particular, dense captioning, a technology based on computer vision, is gaining traction in this field [3][4][5][6][7][8]. Dense captioning is a subset of image captioning technology [9] that understands the characteristics of objects, their activities, and relationships and expresses them in natural language.…”
Section: Introductionmentioning
confidence: 99%