2020
DOI: 10.48550/arxiv.2007.09049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…The model of RMN is from the literature [15]. In the ablation experiments in Table 1, the effectiveness of the proposed gated fusion mechanism and sentence length modulation loss function method is proved, where "RMN" corresponds to the experimental results of the literature [15], The methods proposed in this paper are based on the benchmarking paper model. Firstly, the effectiveness of the gated fusion mechanism is verified.…”
Section: Analysis Of Ablation Experiments Resultsmentioning
confidence: 99%
“…The model of RMN is from the literature [15]. In the ablation experiments in Table 1, the effectiveness of the proposed gated fusion mechanism and sentence length modulation loss function method is proved, where "RMN" corresponds to the experimental results of the literature [15], The methods proposed in this paper are based on the benchmarking paper model. Firstly, the effectiveness of the gated fusion mechanism is verified.…”
Section: Analysis Of Ablation Experiments Resultsmentioning
confidence: 99%
“…We use RMN [59] as our captioning module, and extract features in the same manner as mentioned by the authors. Note that the localization module is kept frozen throughout the training process.…”
Section: Video Captioningmentioning
confidence: 99%
“…Along this line, many works improve video captioning by either designing a better visual encoder [16,40,12,1,15] or language decoder [79,50,73]. Of particular interest are [28,63] which combine a sequence-to-sequence model with old-fashioned template methods for model grounding video caption generation.…”
Section: Related Workmentioning
confidence: 99%