2020
DOI: 10.48550/arxiv.2006.04058
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NITS-VC System for VATEX Video Captioning Challenge 2020

Abstract: Video captioning is process of summarising the content, event and action of the video into a short textual form which can be helpful in many research areas such as video guided machine translation, video sentiment analysis and providing aid to needy individual. In this paper, a system description of the framework used for VATEX-2020 video captioning challenge is presented. We employ an encoder-decoder based approach in which the visual features of the video are encoded using 3D convolutional neural network (C3… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…Other approaches such as [71], [43], and [108] participated in the VATEX video captioning challenge 2020 14 and report their results. The multi-features and hybrid reward strategy approach proposed in [108] was the winner of the video captioning competition and reports the highest result on the VATEX dataset.…”
Section: Results Of State-of-the-art Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Other approaches such as [71], [43], and [108] participated in the VATEX video captioning challenge 2020 14 and report their results. The multi-features and hybrid reward strategy approach proposed in [108] was the winner of the video captioning competition and reports the highest result on the VATEX dataset.…”
Section: Results Of State-of-the-art Approachesmentioning
confidence: 99%
“…al. [71] proposed a video captioning framework in the VATEX challenge using two parallel LSTMs. The way of fusing visual representation with an embedded representation of the reference caption is different for both LSTM.…”
Section: Recentmentioning
confidence: 99%
“…More formally, we denote the task goal as T , input video demonstration as V , and the target text script as S={S 1 ,...,S n } involving n necessary and ordered steps. Compared to action anticipation (Girdhar and Grauman 2021;Zhong et al 2022) or video captioning (Singh, Singh, and Bandyopadhyay 2020), the generated scripts in our task are expected to be well-structured descriptions for a sequence of actions that follow a temporal and logical order.…”
Section: Dataset Design Task Formulationmentioning
confidence: 99%
“…Various other video description datasets depicting everyday activities have been presented [3,6,32,52]. In this work, we mainly focus on the VATEX Captioning dataset [27], which has also been used in the Video-to-Text (VTT) task [17,27,42,[56][57][58]. Furthermore, we validate our models on the MSR-VTT [52] and MSVD [6] datasets.…”
Section: Related Workmentioning
confidence: 99%