2020
DOI: 10.3390/app10196942
|View full text |Cite
|
Sign up to set email alerts
|

Learn and Tell: Learning Priors for Image Caption Generation

Abstract: In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 39 publications
(57 reference statements)
0
1
0
Order By: Relevance
“…They do not integrate the global features and the local features, which limits the improvement of the model description performance. Secondly, in the descriptive statement generation stage of the existing models [7,[13][14][15], the word vector with the highest probability among the candidate words is chosen by the model as the final word and output it directly as a sentence. However, the sentence obtained in this way is not necessarily the best descriptive result.…”
Section: Introductionmentioning
confidence: 99%
“…They do not integrate the global features and the local features, which limits the improvement of the model description performance. Secondly, in the descriptive statement generation stage of the existing models [7,[13][14][15], the word vector with the highest probability among the candidate words is chosen by the model as the final word and output it directly as a sentence. However, the sentence obtained in this way is not necessarily the best descriptive result.…”
Section: Introductionmentioning
confidence: 99%