2023
DOI: 10.48550/arxiv.2301.11174
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 57 publications
0
2
0
Order By: Relevance
“…To verify the effectiveness of our proposed model, we conduct tests on the MOCS test set and compare the results with those of other models. As presented in Table 1, we first calculate the metrics of BLEU@1, BLEU@4, METEOR, ROUGE_L, CIDEr, and SPICE based on cross-entropy loss using Equation (20), with the corresponding results shown in columns 2 through 7 of Table 1. Next, we calculate the metrics based on CIDEr-D optimization using Equation (21), and the corresponding results are shown in columns 8 through 13 of Table 1.…”
Section: Experimental Results 641 Experimental Results On Mocsmentioning
confidence: 99%
See 1 more Smart Citation
“…To verify the effectiveness of our proposed model, we conduct tests on the MOCS test set and compare the results with those of other models. As presented in Table 1, we first calculate the metrics of BLEU@1, BLEU@4, METEOR, ROUGE_L, CIDEr, and SPICE based on cross-entropy loss using Equation (20), with the corresponding results shown in columns 2 through 7 of Table 1. Next, we calculate the metrics based on CIDEr-D optimization using Equation (21), and the corresponding results are shown in columns 8 through 13 of Table 1.…”
Section: Experimental Results 641 Experimental Results On Mocsmentioning
confidence: 99%
“…In terms of model architecture, previous approaches to image captioning have predominantly relied on convolutional neural networks (CNN) and long short-term memory networks (LSTM). CNN is used for encoding spatial features, while LSTM decodes these features into textual descriptions, achieving certain success in image-captioning tasks [ 18 , 19 , 20 ] on public datasets. However, these architectures are limited by the expressive power and training efficiency of LSTM and cannot further improve their performance.…”
Section: Introductionmentioning
confidence: 99%