2014
DOI: 10.48550/arxiv.1411.4389
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
120
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 65 publications
(121 citation statements)
references
References 0 publications
0
120
1
Order By: Relevance
“…1. The accuracy gap between ours and those reported in [7,25,1,20,26] is mainly caused by the availability of different input frames at test time.…”
Section: Experimental Settingcontrasting
confidence: 84%
See 2 more Smart Citations
“…1. The accuracy gap between ours and those reported in [7,25,1,20,26] is mainly caused by the availability of different input frames at test time.…”
Section: Experimental Settingcontrasting
confidence: 84%
“…There are many successful video recognition models. For instance, [7] proposed a class of recurrent long-term models that can be jointly trained to learn temporal dynamics and convolutional perceptual representations, and demonstrated superior performance on recognition and description of images and videos. [25] addressed the problem of learning spatio tempora features for videos using 3D ConvNets.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We explore three state-of-the-art action recognition methods: Long-term Recurrent Convolutional Networks (LRCN) [2], 3D Convolutional Networks (C3D) [12] and Temporal Shift Module (TSM) [7].…”
Section: Action Recognition Methodsmentioning
confidence: 99%
“…Our "self talk" framework has two "executives" that takes their roles iteratively: 1) question generation, which is responsible for asking the right questions, and 2) question answering, which accepts the questions and generate potential answers. With the rapid development in computer vision and machine learning [Mao et al, 2014;Donahue et al, 2014;Karpathy and Li, 2014;Vinyals et al, 2014; Figure 1: One example self talk by the presented system, while the affirmative or questionable answer is decided by confidence score from visual answering executive. Chen and Zitnick, 2014] there are a few tools developed for this seemingly intuitive philosophy in Artificial Intelligence, but self-talk is certainly beyond the aggregation of tools, because it is fundamentally a challenging chicken egg problem.…”
Section: Introductionmentioning
confidence: 99%