2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.317
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Action Detection with Structured Segment Networks

Abstract: Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
646
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 871 publications
(655 citation statements)
references
References 53 publications
8
646
0
1
Order By: Relevance
“…As a task we pay more attention to, the step localization is related to the area of action detection, where promising progress has also been achieved recently [39], [40], [42], [43]. For example, Zhao et al [80] developed structured segment networks (SSN) to model the temporal structure of each action instance with a structured temporal pyramid. Xu et al [74] introduced a Region Convolutional 3D Network (R-C3D) architecture, which was built on C3D [71] and Faster R-CNN [51], to explore the region information of video frames.…”
Section: Replace a Bulb Install A Ceiling Fanmentioning
confidence: 99%
See 4 more Smart Citations
“…As a task we pay more attention to, the step localization is related to the area of action detection, where promising progress has also been achieved recently [39], [40], [42], [43]. For example, Zhao et al [80] developed structured segment networks (SSN) to model the temporal structure of each action instance with a structured temporal pyramid. Xu et al [74] introduced a Region Convolutional 3D Network (R-C3D) architecture, which was built on C3D [71] and Faster R-CNN [51], to explore the region information of video frames.…”
Section: Replace a Bulb Install A Ceiling Fanmentioning
confidence: 99%
“…Bottom-up Aggregation: As our method is built upon the proposal-based action detection methods, we start with training an existing action detector, e.g., SSN [80], on our COIN dataset. During inference phase, given an input video, we send it into the action detector to produce a series of proposals with their corresponding locations and predicted scores.…”
Section: Task-consistency Analysismentioning
confidence: 99%
See 3 more Smart Citations