Background Suppression Network for Weakly-Supervised Temporal Action Localization

Lee, Pilhyeon; Uh, Youngjung; Byun, Hyeran

doi:10.1609/aaai.v34i07.6793

Cited by 219 publications

(190 citation statements)

References 18 publications

Supporting

Mentioning

189

Contrasting

Order By: Relevance

“…These results on THUMOS'14 are summarized in Table 2. Our method outperforms all weakly supervised methods except BaSNet [17], against which it shows a slight performance decrease while being more data efficient and having a simpler network design. Besides, our iterative approach takes around 4.6 minutes to train even on CPU.…”

Section: Methodsmentioning

confidence: 88%

PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

Zia

Kayhan

Gemert

2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Popular approaches to classifying action segments in long, realistic, untrimmed videos start with high quality action proposals. Current action proposal methods based on deep learning are trained on labeled video segments. Obtaining annotated segments for untrimmed videos is time consuming, expensive and error-prone as annotated temporal action boundaries are imprecise, subjective and inconsistent. By embracing this uncertainty we explore to significantly speed up temporal annotations by using just a single key frame label for each action instance instead of the inherently imprecise start and end frames. To tackle the class imbalance by using only a single frame, we evaluate an extremely simple Positive-Unlabeled algorithm (PU-learning). We demonstrate on THUMOS'14 and ActivityNet that using a single key frame label give good results while being significantly faster to annotate. In addition, we show that our simple method, PUNet 1 , is data-efficient which further reduces the need for expensive annotations.

show abstract

Section: Methodsmentioning

confidence: 88%

PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

Zia

Kayhan

Gemert

2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

show abstract

“…At the same time, they propose a scheme generating a hard negative video for separating contexts. Although the main point of this article is not the background class, it inspires the next subsequent three works that are BaSNet [36], background modeling [37], and LPAT [38]. Without considering the background category, the background frames were misclassified into action categories, resulting in a large number of FPs.…”

Section: ) Current Representative Methodsmentioning

confidence: 99%

A Survey on Temporal Action Localization

Xia

Zhan

2020

IEEE Access

View full text Add to dashboard Cite

Temporal action localization is one of the most crucial and challenging problems for video understanding in computer vision. It has received a lot of attention in recent years because of the extensive application of daily life. Temporal action localization has made some significant progress, especially with the development of deep learning recently. And more demand is for temporal action localization in untrimmed videos. In this paper, our target is to survey the state-of-the-art techniques and models for video temporal action localization. It mainly includes the related techniques, some benchmark datasets and the evaluation metrics of temporal action localization. In addition, we summarize temporal action localization from two aspects: fully-supervised learning and weakly-supervised learning. And we list several representative works and compare their performances respectively. Finally, we make some deep analysis and propose potential research directions, and conclude the survey.

show abstract

“…First, we select the initial seed for the clustering algorithm. Inspired by researches in information theory [43], [44], we think a feature vector to be more informative if its magnitude is larger. Thus, we calculate the L1 norm for all vectors, sort them and select the one corresponding to the median as the initial seed for the clustering algorithm.…”

Section: Instance Segmentation Network Equipped With the Region Normalization Mechanismmentioning

confidence: 99%

Performing Weakly Supervised Retail Instance Segmentation via Region Normalization

Wang

2021

IEEE Access

View full text Add to dashboard Cite

Background Suppression Network for Weakly-Supervised Temporal Action Localization

Cited by 219 publications

References 18 publications

PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

A Survey on Temporal Action Localization

Performing Weakly Supervised Retail Instance Segmentation via Region Normalization

Contact Info

Product

Resources

About