2022
DOI: 10.1049/cvi2.12099
|View full text |Cite
|
Sign up to set email alerts
|

Semantic‐meshed and content‐guided transformer for image captioning

Abstract: The transformer architecture has been the dominant framework for today's image captioning tasks because of its superior performance. However, existing methods based on transformer often lack the integrated use of multi-level semantic information and are weak in maintaining the relevance of captions to the image. In this paper, a semanticmeshed and content-guided transformer network is introduced for image captioning to solve these problems. The semantic-meshed mechanism allows the model to generate words by se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 46 publications
0
1
0
Order By: Relevance
“…Computer vision is an important branch of computer technology, and it is a complex field. An important task of computer vision is to process the collected image and video information, and the processing effect can be similar to that of human processing [1][2][3]. This technology has broad applications, which involve high processing requirements, and some tasks require human assistance.…”
Section: Introductionmentioning
confidence: 99%
“…Computer vision is an important branch of computer technology, and it is a complex field. An important task of computer vision is to process the collected image and video information, and the processing effect can be similar to that of human processing [1][2][3]. This technology has broad applications, which involve high processing requirements, and some tasks require human assistance.…”
Section: Introductionmentioning
confidence: 99%