2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00750
|View full text |Cite
|
Sign up to set email alerts
|

Weakly Supervised Action Selection Learning in Video

Abstract: Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame-level activations are then used for localization. However, the abs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(39 citation statements)
references
References 37 publications
0
39
0
Order By: Relevance
“…To solve that, MIL conducts classification for the whole video, during which it carefully selects some parts of the video with high activation scores as action segments. Typically the selection method can be either k-max pooling [44,34,30,40,25], or attention-based pooling [44,31,46,22]. On top of the basic MIL classifier, previous researchers also explored adding different regularization losses, constraints or temporal property modeling to boost the action detection performance.…”
Section: Weakly-supervised Temporal Action Detectionmentioning
confidence: 99%
See 2 more Smart Citations
“…To solve that, MIL conducts classification for the whole video, during which it carefully selects some parts of the video with high activation scores as action segments. Typically the selection method can be either k-max pooling [44,34,30,40,25], or attention-based pooling [44,31,46,22]. On top of the basic MIL classifier, previous researchers also explored adding different regularization losses, constraints or temporal property modeling to boost the action detection performance.…”
Section: Weakly-supervised Temporal Action Detectionmentioning
confidence: 99%
“…Shou et al [40] added a contrastive loss to help action boundary prediction. And Ma et al [25] introduced a class-agnostic actionness score, which leverages context to help focus on the parts that contain actions.…”
Section: Weakly-supervised Temporal Action Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, learning prototypes to represent different aspects of the action (e.g., the start/end of the action) is a difficult task even with annotation [9,40,41]. Thus, it is not sufficient to compute video similarity only by the global prototypes.…”
Section: Introductionmentioning
confidence: 99%
“…Thanks to the various applications [37,55,58], it has drawn much attention from researchers, leading to the rapid and remarkable progress in the fully-supervised setting (i.e., frame-level labels) [31,51,53,60]. Meanwhile, there appear attempts to reduce the prohibitively expensive cost of annotating individual frames by devising weaklysupervised models with video-level labels [8,36,56,66].…”
Section: Introductionmentioning
confidence: 99%