Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

Li, Nannan; Xu, Dan; Ying, Zhenqiang; Li, Zhihao; Li, Ge

doi:10.1007/978-3-319-54184-6_24

Cited by 6 publications

(17 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For UCF-101 dataset, our method out-performs the state-ofthe-art [15] by 20% or more in all range of IoU. Although Li et al use deep network (RPN) [19], they use only one stream and their performance is only 4% better than the unsupervised method APT [14] in terms of the recall as shown in Table III.…”

Section: B Comparison To State-of-the-artsmentioning

confidence: 92%

“…Although aforementioned methods have greatly advanced the quality of action proposal, they still have limitations. Specifically, most of these works [3]- [5], [12], [14] either produce action proposals frame-by-frame individually which ignores the interplay between appearance, motion and temporal context among adjacent frames or arrange the spatial information learning and temporal context learning into isolated processes [15], [29] which produce less satisfactory results. Moreover, most of these methods work on trimmed videos [3]- [5], [14].…”

Section: Action Proposalsmentioning

confidence: 99%

“…One limitation of existing deep learning based action proposals methods is they either process a video frame-byframe [5] or separate spatial and temporal information learning into two isolated processes [15], [29]. On the other hand, temporal dynamics and contexts among adjacent frames have been proven to be useful for recent state-of-the-arts in action classification and video description [1].…”

Section: A Yotube For Action Candidate Boxes Generationmentioning

confidence: 99%

“…: OV (d j , g i ) ≥ η where η ∈ [0, 1]. In our work, we target to maximize the recall at a 0.5 threshold as other works [14], [15].…”

Section: B Datasetsmentioning

confidence: 99%

“…Brox eccv-10 [27] Jain cvpr-14 [3] Oneata eccv-14 [4] Gkioxari cpr-15 [5] APT bmvc-15 [14] Li accv-16 [15] YoTube (FLOW + RGB)…”

Section: Ucf-sports: Recall (%) Per Iou Thresholdmentioning

confidence: 99%

See 4 more Smart Citations

YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks

Zhu

Vial

et al. 2018

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

In this paper, we propose YoTube-a novel deep learning framework for generating action proposals in untrimmed videos, where each action proposal corresponds to a spatial-temporal tube that potentially locates one human action. Most of the existing works generate proposals by clustering low-level features or linking image proposals, which ignore the interplay between long-term temporal context and short-term cues. Different from these works, our method considers the interplay by designing a new recurrent YoTube detector and static YoTube detector. The recurrent YoTube detector sequentially regresses candidate bounding boxes using Recurrent Neural Network learned long-term temporal contexts. The static YoTube detector produces bounding boxes using rich appearance cues in every single frame. To fully exploit the complementary appearance, motion, and temporal context, we train the recurrent and static detector using RGB (Color) and flow information. Moreover, we fuse the corresponding outputs of the detectors to produce accurate and robust proposal boxes and obtain the final action proposals by linking the proposal boxes using dynamic programming with a novel path trimming method. Benefiting from the pipeline of our method, the untrimmed video could be effectively and efficiently handled. Extensive experiments on the challenging UCF-101, UCF-Sports, and JHMDB datasets show superior performance of the proposed method compared with the state of the arts.

show abstract

Section: B Comparison To State-of-the-artsmentioning

confidence: 92%

Section: Action Proposalsmentioning

confidence: 99%

Section: A Yotube For Action Candidate Boxes Generationmentioning

confidence: 99%

“…: OV (d j , g i ) ≥ η where η ∈ [0, 1]. In our work, we target to maximize the recall at a 0.5 threshold as other works [14], [15].…”

Section: B Datasetsmentioning

confidence: 99%

“…Brox eccv-10 [27] Jain cvpr-14 [3] Oneata eccv-14 [4] Gkioxari cpr-15 [5] APT bmvc-15 [14] Li accv-16 [15] YoTube (FLOW + RGB)…”

Section: Ucf-sports: Recall (%) Per Iou Thresholdmentioning

confidence: 99%

See 3 more Smart Citations

YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks

Zhu

Vial

et al. 2018

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

show abstract

Weakly Supervised Action Selection Learning in Video

Ma¹,

Gorti²,

Volkovs³

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame-level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3% and 5.7% relative improvement respectively. We further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here: https://github.com/layer6ai-labs/ASL.

show abstract

Search video action proposal with recurrent and static YOLO

Vial

Zhu

2017

2017 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

Cited by 6 publications

References 27 publications

YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks

YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks

Weakly Supervised Action Selection Learning in Video

Search video action proposal with recurrent and static YOLO

Contact Info

Product

Resources

About