2014 IEEE Conference on Computer Vision and Pattern Recognition 2014
DOI: 10.1109/cvpr.2014.100
|View full text |Cite
|
Sign up to set email alerts
|

Action Localization with Tubelets from Motion

Abstract: This paper considers the problem of action localization, where the objective is to determine when and where certain actions appear. We introduce a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets. Compared to state-of-the-art alternatives, this drastically reduces the number of hypotheses that are likely to include the action of interest. Our method is inspired by a recent technique introduced in the context of image localization. Beyond considering this technique for the first ti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
248
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 243 publications
(249 citation statements)
references
References 33 publications
0
248
0
1
Order By: Relevance
“…In contrast, our model efficiently reduces the number of evaluated windows by encoding a sequence of visual descriptors. Spatio-Temporal Action Proposals: Recently, ideas from the area of object proposals have been extrapolated to action recognition in the video domain [6,17,11,21,24,10,39]. Most of these methods produce spatio-temporal object segments to perform spatio-temporal detection of simple or cyclic actions on short video sequences, hence their scalability to real-world scenarios is uncertain.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In contrast, our model efficiently reduces the number of evaluated windows by encoding a sequence of visual descriptors. Spatio-Temporal Action Proposals: Recently, ideas from the area of object proposals have been extrapolated to action recognition in the video domain [6,17,11,21,24,10,39]. Most of these methods produce spatio-temporal object segments to perform spatio-temporal detection of simple or cyclic actions on short video sequences, hence their scalability to real-world scenarios is uncertain.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these methods produce spatio-temporal object segments to perform spatio-temporal detection of simple or cyclic actions on short video sequences, hence their scalability to real-world scenarios is uncertain. These methods rely on straddling of voxels [6,17], reasoning over dense trajectories [24,39], or non real-time object proposals [11], which increase their computational cost and reduce their competitiveness at large scales. Temporal Action Proposals: Very recently, work emerged that focused on temporal segments which are likely to contain human actions [4,22,30].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…proposed in [30,17]. In a localization task the objective is to carefully and completely identify what, when and where each object is in a video.…”
Section: Animal Countingmentioning
confidence: 99%
“…Thus, such methods are tuned towards human photographs, taken from a height of 1-2 meters with human-scale objects. Such objects can safely be assumed to consist of observable parts [11] or to be found by object-saliency methods (so called "object proposals"), tuned to human scale [1,15,17,31]. Yet, for drone imagery taken in high altitude (10-100m) the objects of interest are relatively small, questioning the suitability of current methods that use individual parts or object-saliency.…”
Section: Introductionmentioning
confidence: 99%