2018
DOI: 10.1007/978-3-030-01225-0_35
|View full text |Cite
|
Sign up to set email alerts
|

W-TALC: Weakly-Supervised Temporal Activity Localization and Classification

Abstract: Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
338
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 291 publications
(338 citation statements)
references
References 64 publications
0
338
0
Order By: Relevance
“…We then present our overall architecture followed by a detailed description of the different loss terms in the proposed formulation. Feature Extraction: As in [14,16], we use Inflated 3D (I3D) features extracted from the RGB and flow I3D deep networks [4], trained on the Kinetics dataset, to encode appearance and motion information, respectively. A video is divided into non-overlapping segments, each consisting of 16 frames.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…We then present our overall architecture followed by a detailed description of the different loss terms in the proposed formulation. Feature Extraction: As in [14,16], we use Inflated 3D (I3D) features extracted from the RGB and flow I3D deep networks [4], trained on the Kinetics dataset, to encode appearance and motion information, respectively. A video is divided into non-overlapping segments, each consisting of 16 frames.…”
Section: Methodsmentioning
confidence: 99%
“…This leads to improved action localization results. However, the formulation in [16] puts a constraint on the mini-batch, used for training, to mostly contain paired videos with actions belonging to the same category. In this work, we look into an alternative formulation that allows the mini-batch to contain diverse action samples during training.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…AutoLoc [31] is proposed to directly predict the temporal boundary of each action instance with an outer-inner-contrastive loss to train the boundary predictor. W-TALC [32] learns the specific network weights by optimizing two complimentary loss functions, namely coactivity similarity loss and multiple instance learning loss. To fully leverage the publicly available trimmed videos, this paper further studies a reliable knowledge transfer mechanism from trimmed to untrimmed videos under adaptability constraint for effective action recognition and localization with weak supervision.…”
Section: B Action Localizationmentioning
confidence: 99%