2017
DOI: 10.1007/978-3-319-54184-6_24
|View full text |Cite
|
Sign up to set email alerts
|

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

Abstract: Abstract. In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same actor across frames via linking, associating, tracking to generate spatial-temporal continuous action paths. To achieve the target, a novel actionness estimation method is firstly proposed by utilizing both human appearance and motion cues. Then, the association of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(17 citation statements)
references
References 27 publications
1
16
0
Order By: Relevance
“…For UCF-101 dataset, our method out-performs the state-ofthe-art [15] by 20% or more in all range of IoU. Although Li et al use deep network (RPN) [19], they use only one stream and their performance is only 4% better than the unsupervised method APT [14] in terms of the recall as shown in Table III.…”
Section: B Comparison To State-of-the-artsmentioning
confidence: 92%
See 4 more Smart Citations
“…For UCF-101 dataset, our method out-performs the state-ofthe-art [15] by 20% or more in all range of IoU. Although Li et al use deep network (RPN) [19], they use only one stream and their performance is only 4% better than the unsupervised method APT [14] in terms of the recall as shown in Table III.…”
Section: B Comparison To State-of-the-artsmentioning
confidence: 92%
“…Although aforementioned methods have greatly advanced the quality of action proposal, they still have limitations. Specifically, most of these works [3]- [5], [12], [14] either produce action proposals frame-by-frame individually which ignores the interplay between appearance, motion and temporal context among adjacent frames or arrange the spatial information learning and temporal context learning into isolated processes [15], [29] which produce less satisfactory results. Moreover, most of these methods work on trimmed videos [3]- [5], [14].…”
Section: Action Proposalsmentioning
confidence: 99%
See 3 more Smart Citations