2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00043
|View full text |Cite
|
Sign up to set email alerts
|

Gaussian Temporal Awareness Networks for Action Localization

Abstract: Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
155
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 340 publications
(155 citation statements)
references
References 36 publications
0
155
0
Order By: Relevance
“…But SS-TAD adopts the anchor mechanism and the stacked GRU units. Recently, Fuchen Long et al introduces GTAN (Gaussian Temporal Awareness Networks) [25] which integrates temporal structure to one-stage action localization. In GTAN, it introduces Gaussian kernels to optimize temporal scale of every action proposal dynamically.…”
Section: ) One-stage Localization Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…But SS-TAD adopts the anchor mechanism and the stacked GRU units. Recently, Fuchen Long et al introduces GTAN (Gaussian Temporal Awareness Networks) [25] which integrates temporal structure to one-stage action localization. In GTAN, it introduces Gaussian kernels to optimize temporal scale of every action proposal dynamically.…”
Section: ) One-stage Localization Methodsmentioning
confidence: 99%
“…Furthermore, some region-based methods (such as R-C3D [23] and TAL-Net [24]) propose to generalize the methods for 2D object detection to 1D temporal action localization. Recently, TSA-Net [26] and Gaussian temporal modeling [25] are proposed for accurate action localization. The following are the performance comparisons.…”
Section: ) Current Representative Methodsmentioning
confidence: 99%
“…Gkioxari et al generate action proposals by filtering Selective Search boxes with motion saliency, and fuse motion and temporal decision using an SVM [14]. More recent approaches, devised end-to-end tunable architectures integrating region proposal networks in their model [4,13,23,26,35,44,47,48,50,65]. As discussed in [47], most action detection works do not deal with untrimmed sequences and do not generate action tubes.…”
Section: Related Workmentioning
confidence: 99%
“…Action progress prediction is an extremely challenging task since, to be of maximum utility, the prediction should be made online while observing the video. While a thick crop of literature addresses action detection and spatio-temporal localization [12,13,23,26,35,44,59,62,66], predicting action progress is more closely related to online action detection [6,22,28,50,61]. Here the goal is to accurately detect, as soon as possible, when an action has started and when it has finished, but they do not have a model to estimate the progress.…”
Section: Introductionmentioning
confidence: 99%
“…This trend encourages the development of effective and efficient algorithms to intelligently parse video data [1,2,3,4,5,6] and discover semantic information [7,8]. One fundamental challenge underlying the success of these advances is action detection from videos in both temporal [9,10] and spatio-temporal aspects [11]. In this study, we focus on the temporal action detection task, which aims to find the exact time stamps of an action's start and end time, and recognize the category of the action.…”
Section: Introductionmentioning
confidence: 99%