2020
DOI: 10.1609/aaai.v34i07.6793
|View full text |Cite
|
Sign up to set email alerts
|

Background Suppression Network for Weakly-Supervised Temporal Action Localization

Abstract: Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
189
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 219 publications
(190 citation statements)
references
References 18 publications
1
189
0
Order By: Relevance
“…These results on THUMOS'14 are summarized in Table 2. Our method outperforms all weakly supervised methods except BaSNet [17], against which it shows a slight performance decrease while being more data efficient and having a simpler network design. Besides, our iterative approach takes around 4.6 minutes to train even on CPU.…”
Section: Methodsmentioning
confidence: 88%
“…These results on THUMOS'14 are summarized in Table 2. Our method outperforms all weakly supervised methods except BaSNet [17], against which it shows a slight performance decrease while being more data efficient and having a simpler network design. Besides, our iterative approach takes around 4.6 minutes to train even on CPU.…”
Section: Methodsmentioning
confidence: 88%
“…At the same time, they propose a scheme generating a hard negative video for separating contexts. Although the main point of this article is not the background class, it inspires the next subsequent three works that are BaSNet [36], background modeling [37], and LPAT [38]. Without considering the background category, the background frames were misclassified into action categories, resulting in a large number of FPs.…”
Section: ) Current Representative Methodsmentioning
confidence: 99%
“…First, we select the initial seed for the clustering algorithm. Inspired by researches in information theory [43], [44], we think a feature vector to be more informative if its magnitude is larger. Thus, we calculate the L1 norm for all vectors, sort them and select the one corresponding to the median as the initial seed for the clustering algorithm.…”
Section: Instance Segmentation Network Equipped With the Region Normalization Mechanismmentioning
confidence: 99%