2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.619
|View full text |Cite
|
Sign up to set email alerts
|

TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 58 publications
(37 citation statements)
references
References 21 publications
0
37
0
Order By: Relevance
“…In their model, each complete activity instance is considered as a composition of three major stages, namely starting, course, and ending, and they introduced structured temporal pyramid pooling to produce a global representation of the entire proposal. Differently from previous methods, Zhu et al [179] proposed a framework that integrates the complementary spatial and temporal information into an end-to-end trainable system for video action proposal, and a novel and efficient path trimming method is proposed to handle untrimmed video by examining actionness and background score pattern without using extra detectors. To generalize R-CNN from 2D to 3D, Hou et al [52] proposed an end-to-end 3D CNN-based approach for action detection in videos.…”
Section: Cnn-based Approachmentioning
confidence: 99%
“…In their model, each complete activity instance is considered as a composition of three major stages, namely starting, course, and ending, and they introduced structured temporal pyramid pooling to produce a global representation of the entire proposal. Differently from previous methods, Zhu et al [179] proposed a framework that integrates the complementary spatial and temporal information into an end-to-end trainable system for video action proposal, and a novel and efficient path trimming method is proposed to handle untrimmed video by examining actionness and background score pattern without using extra detectors. To generalize R-CNN from 2D to 3D, Hou et al [52] proposed an end-to-end 3D CNN-based approach for action detection in videos.…”
Section: Cnn-based Approachmentioning
confidence: 99%
“…These methods [9,39] are based on grouping techniques over low-level primitives such as color and motion, which reaffirms our intuition about the relevance of actors as a strong cue for the localization of the actions. The state of the art approaches for the generation of action proposals [44,41] are fully supervised based on a mix of convolutional and recurrent stages and supervised instance level tracking, respectively. Although these works generate action proposals of better quality, they do so at the cost of a significant amount of additional supervision.…”
Section: Quality Of Actor Proposalsmentioning
confidence: 99%
“…Singh et al [17] presented a deep-learning framework based on SSD with an efficient online algorithm to incrementally construct and label action tubes from the SSD frame level detections for real-time multiple spatio-temporal action localization and classification. Zhu et al [20] proposed a spatio-temporal convolutional network which consists of a temporal convolutional regression network and a spatial regression network by empowering convolutional LSTM with regression capability. Escorcia et al [12] developed an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations to localize actions.…”
Section: Related Workmentioning
confidence: 99%