2014
DOI: 10.48550/arxiv.1412.4729
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
163
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 103 publications
(163 citation statements)
references
References 0 publications
0
163
0
Order By: Relevance
“…Video Question Answering. In Table 3, zeroshot VideoCLIP outperforms most supervised DiDeMo dataset R@1 ↑R@5 SUPERVISED S2VT (Venugopalan et al, 2014) 11.9 33.6 FSE (Zhang et al, 2018 13.9 44.5 CE (Liu et al, 2019a) 16.1 41.1 ClipBERT 20.4 48.0 ZERO-SHOT VideoCLIP (Zero-shot)…”
Section: Resultsmentioning
confidence: 99%
“…Video Question Answering. In Table 3, zeroshot VideoCLIP outperforms most supervised DiDeMo dataset R@1 ↑R@5 SUPERVISED S2VT (Venugopalan et al, 2014) 11.9 33.6 FSE (Zhang et al, 2018 13.9 44.5 CE (Liu et al, 2019a) 16.1 41.1 ClipBERT 20.4 48.0 ZERO-SHOT VideoCLIP (Zero-shot)…”
Section: Resultsmentioning
confidence: 99%
“…5. Specifically, one of the early works [104], which is only applicable for videos of short duration, employs mean-pooling to frame representations extracted by a shared CNN and utilizes an LSTM architecture for caption generation. To extend the validity of extracted features to longer durations, recurrent visual encoder architectures are used [94,105,106].…”
Section: Image and Video Captioningmentioning
confidence: 99%
“…[35,36] tailor better recurrent layers that are easy to stack deep for higher-dimensional video information. These recurrent-based methods have advantages over convolutional ones for tasks sensitive to sequence order, such as video future prediction [32,56,67], trajectory prediction [47], and video description [4,55]. While for tasks that more focus on integrated features like action recognition [65,25,37,2,6,24,23], there is still a gap between the recurrent and convolutional models.…”
Section: Related Workmentioning
confidence: 99%