Proceedings of the 25th ACM International Conference on Multimedia 2017
DOI: 10.1145/3123266.3123343
|View full text |Cite
|
Sign up to set email alerts
|

Single Shot Temporal Action Detection

Abstract: Temporal action detection is a very important yet challenging problem, since videos in real applications are usually long, untrimmed and contain multiple action instances. This problem requires not only recognizing action categories but also detecting start time and end time of each action instance. Many state-of-the-art methods adopt the "detection by classification" framework: first do proposal, and then classify proposals. The main drawback of this framework is that the boundaries of action instance proposa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
296
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 450 publications
(311 citation statements)
references
References 47 publications
1
296
0
1
Order By: Relevance
“…Temporal Action localization has attracted increasing attention in the last several years [6,18,26,33,34]. Inspired by the success of object detection, most current action detection methods resort to the two-stage pipeline: they first generate a set of 1D temporal proposals and then perform classification and temporal boundary regression on each proposal individually.…”
Section: Introductionmentioning
confidence: 99%
“…Temporal Action localization has attracted increasing attention in the last several years [6,18,26,33,34]. Inspired by the success of object detection, most current action detection methods resort to the two-stage pipeline: they first generate a set of 1D temporal proposals and then perform classification and temporal boundary regression on each proposal individually.…”
Section: Introductionmentioning
confidence: 99%
“…Structured segment network (SSN) [25] proposes to model the temporal structure of each action instance via a structured temporal pyramid. SSAD [26] is proposed to use 1D temporal convolutional layers to skip the proposal generation step via directly detecting action instances in untrimmed videos. There are also some deep networks generating temporal proposals by probability estimation, such as Boundary Sensitive Network [27].…”
Section: B Action Localizationmentioning
confidence: 99%
“…We compare the following advanced approaches: (1) Structure Segment Network (SSN) [45] generates action proposals by temporal actionness grouping. (2) Single Shot Action Detection (SSAD) [19] is the 1D variant version of Single Shot Detection [23], which generates action proposals by multiple temporal anchor layers. (3) Convolution-De-Convolution Network (CDC) [31] builds a 3D Conv-Deconv network to precisely localize the boundary of action instances at frame level.…”
Section: Evaluation On Temporal Action Proposalmentioning
confidence: 99%
“…One natural way of temporal action localization is to extend image object detection frameworks, e.g., SSD [23] or Faster R-CNN [29], for producing spatial bounding boxes in a 2D image to temporal localization of an action in a 1D sequence [4,19]. The upper part of Figure 1 conceptualizes a typical process of one-stage action localization.…”
Section: Introductionmentioning
confidence: 99%