Gaussian Temporal Awareness Networks for Action Localization

Long, Fuchen; Yao, Ting; Qiu, Zhaofan; Tian, Xinmei; Luo, Jiebo; Mei, Tao

doi:10.1109/cvpr.2019.00043

Cited by 340 publications

(155 citation statements)

References 36 publications

Supporting

Mentioning

155

Contrasting

Order By: Relevance

“…But SS-TAD adopts the anchor mechanism and the stacked GRU units. Recently, Fuchen Long et al introduces GTAN (Gaussian Temporal Awareness Networks) [25] which integrates temporal structure to one-stage action localization. In GTAN, it introduces Gaussian kernels to optimize temporal scale of every action proposal dynamically.…”

Section: ) One-stage Localization Methodsmentioning

confidence: 99%

“…Furthermore, some region-based methods (such as R-C3D [23] and TAL-Net [24]) propose to generalize the methods for 2D object detection to 1D temporal action localization. Recently, TSA-Net [26] and Gaussian temporal modeling [25] are proposed for accurate action localization. The following are the performance comparisons.…”

Section: ) Current Representative Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Temporal Action Localization

Xia

Zhan

2020

IEEE Access

View full text Add to dashboard Cite

Temporal action localization is one of the most crucial and challenging problems for video understanding in computer vision. It has received a lot of attention in recent years because of the extensive application of daily life. Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos. In this paper, our target is to survey the state-of-the-art techniques and models for video temporal action localization. It mainly includes the related techniques, some benchmark datasets and the evaluation metrics of temporal action localization. In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning. And we list several representative works and compare their performances respectively. Finally, we make some deep analysis and propose potential research directions, and conclude the survey.

show abstract

Section: ) One-stage Localization Methodsmentioning

confidence: 99%

Section: ) Current Representative Methodsmentioning

confidence: 99%

A Survey on Temporal Action Localization

Xia

Zhan

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Gkioxari et al generate action proposals by filtering Selective Search boxes with motion saliency, and fuse motion and temporal decision using an SVM [14]. More recent approaches, devised end-to-end tunable architectures integrating region proposal networks in their model [4,13,23,26,35,44,47,48,50,65]. As discussed in [47], most action detection works do not deal with untrimmed sequences and do not generate action tubes.…”

Section: Related Workmentioning

confidence: 99%

“…Action progress prediction is an extremely challenging task since, to be of maximum utility, the prediction should be made online while observing the video. While a thick crop of literature addresses action detection and spatio-temporal localization [12,13,23,26,35,44,59,62,66], predicting action progress is more closely related to online action detection [6,22,28,50,61]. Here the goal is to accurately detect, as soon as possible, when an action has started and when it has finished, but they do not have a model to estimate the progress.…”

Section: Introductionmentioning

confidence: 99%

Am I Done? Predicting Action Progress in Videos

Becattini

Uricchio

Seidenari

et al. 2020

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

In this article, we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task, since it can be valuable for a wide range of interaction applications. To this end, we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general definition of action progress, we ground our work in the linguistics literature, borrowing terms and concepts to understand which actions can be the subject of progress estimation. As a result, we define a categorization of actions and their phases. Motivated by the recent success obtained from the interaction of Convolutional and Recurrent Neural Networks, our model is based on a combination of the Faster R-CNN framework, to make framewise predictions, and LSTM networks, to estimate action progress through time. After introducing two evaluation protocols for the task at hand, we demonstrate the capability of our model to effectively predict action progress on the UCF-101 and J-HMDB datasets.

show abstract

“…This trend encourages the development of effective and efficient algorithms to intelligently parse video data [1,2,3,4,5,6] and discover semantic information [7,8]. One fundamental challenge underlying the success of these advances is action detection from videos in both temporal [9,10] and spatio-temporal aspects [11]. In this study, we focus on the temporal action detection task, which aims to find the exact time stamps of an action's start and end time, and recognize the category of the action.…”

Section: Introductionmentioning

confidence: 99%

Decoupling Localization and Classification in Single Shot Temporal Action Detection

Huang

Dai

2019

2019 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Video temporal action detection aims to temporally localize and recognize the action in untrimmed videos. Existing onestage approaches mostly focus on unifying two subtasks, i.e., localization of action proposals and classification of each proposal through a fully shared backbone. However, such design of encapsulating all components of two subtasks in one single network might restrict the training by ignoring the specialized characteristic of each subtask. In this paper, we propose a novel Decoupled Single Shot temporal Action Detection (Decouple-SSAD) method to mitigate such problem by decoupling the localization and classification in a one-stage scheme. Particularly, two separate branches are designed in parallel to enable each component to own representations privately for accurate localization or classification. Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream. High-level semantic information from deeper layers is thus incorporated to enhance the feature representations. We conduct extensive experiments on THUMOS14 dataset and demonstrate superior performance over state-of-the-art methods. Our code is available online 1 .

show abstract

Gaussian Temporal Awareness Networks for Action Localization

Cited by 340 publications

References 36 publications

A Survey on Temporal Action Localization

A Survey on Temporal Action Localization

Am I Done? Predicting Action Progress in Videos

Decoupling Localization and Classification in Single Shot Temporal Action Detection

Contact Info

Product

Resources

About