Weakly Supervised Temporal Action Detection with Shot-Based Temporal Pooling Network

Su, Haisheng; Zhao, Xu; Lin, Tianwei; Fei, Haiping

doi:10.1007/978-3-030-04212-7_37

Cited by 8 publications

(2 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, we adopt a blockbased processing strategy to obtain a video-level classification score, and propose a novel metric function as a similarity measure between activity portions of the videos. Su et al [38] proposed shot-based sampling instead of uniform sampling and designed a multi-stage temporal pooling network for action localization. Zeng et al [49] proposed an iterative training strategy to use not only the most discriminative action instances but also the less discriminative ones.…”

Section: Related Workmentioning

confidence: 99%

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Islam¹,

Radke²

2020

Preprint

View full text Add to dashboard Cite

Temporal action localization is an important step towards video understanding. Most current action localization methods depend on untrimmed videos with full temporal annotations of action instances. However, it is expensive and time-consuming to annotate both action labels and temporal boundaries of videos. To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training. We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances. We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm. Extensive experiments demonstrate the effectiveness of both of these components in temporal localization. We evaluate our algorithm on two challenging untrimmed video datasets: THUMOS14 and Ac-tivityNet1.2. Our approach improves the current state-ofthe-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for Activi-tyNet1.2.

show abstract

Section: Related Workmentioning

confidence: 99%

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Islam¹,

Radke²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Given an input video X v , we form unit sequence U and extract corresponding feature sequence F with length l u . Since an untrimmed video usually comes up with extremely irrelevant frames with action instances only occupying small parts, we totally test three sampling methods to simplify the feature sequence for computational cost reduction: (1) uniform sampling: units are extracted with a regular interval σ from U , thus the final unit sequence and feature sequence are U = {u j } l u j=1 and F separately, where l u = lu σ ; (2) sparse sampling: we first divide the video into P segments {S 1 , S 2 , ..., S P } with equal length, then during each training epoch, we randomly sample one unit from each segment to form the unit sequence of length P ; (3) shot-based sampling: considering the action structure, we sample the unit sequence U based on action shots, which are generated by shot boundary detector [28]. Evaluation results of these sampling methods are shown in Section 4.3.…”

Section: Training Of Cpmnmentioning

confidence: 99%

Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization

Zhao

Lin

2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Weakly supervised temporal action localization, which aims at temporally locating action instances in untrimmed videos using only video-level class labels during training, is an important yet challenging problem in video analysis. Many current methods adopt the "localization by classification" framework: first do video classification, then locate temporal area contributing to the results most. However, this framework fails to locate the entire action instances and gives little consideration to the local context. In this paper, we present a novel architecture called Cascaded Pyramid Mining Network (CPMN) to address these issues using two effective modules. First, to discover the entire temporal interval of specific action, we design a two-stage cascaded module with proposed Online Adversarial Erasing (OAE) mechanism, where new and complementary regions are mined through feeding the erased feature maps of discovered regions back to the system. Second, to exploit hierarchical contextual information in videos and reduce missing detections, we design a pyramid module which produces a scale-invariant attention map through combining the feature maps from different levels. Final, we aggregate the results of two modules to perform action localization via locating high score areas in temporal Class Activation Sequence (CAS). Extensive experiments conducted on THUMOS14 and ActivityNet-1.3 datasets demonstrate the effectiveness of our method.

show abstract

SGLP-Net: Sparse Graph Label Propagation Network for Weakly-Supervised Temporal Action Localization

Wu,

Song

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Weakly Supervised Temporal Action Detection with Shot-Based Temporal Pooling Network

Cited by 8 publications

References 16 publications

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Weakly Supervised Temporal Action Localization Using Deep Metric Learning

Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization

SGLP-Net: Sparse Graph Label Propagation Network for Weakly-Supervised Temporal Action Localization

Contact Info

Product

Resources

About