2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00736
|View full text |Cite
|
Sign up to set email alerts
|

Pulling Actions out of Context: Explicit Separation for Effective Combination

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 30 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…The results validate that mitigating scene bias can improve generalization to the target action classification datasets. As shown in the first block of Table 1, action-context factorized C3D (referred to as Factor-C3D) [61] also improves the baseline C3D [51] on UCF-101. The accuracy of Factor-C3D is on par with our debiased 3D-ResNet-18.…”
Section: Transfer Learning For Action Classificationmentioning
confidence: 94%
See 1 more Smart Citation
“…The results validate that mitigating scene bias can improve generalization to the target action classification datasets. As shown in the first block of Table 1, action-context factorized C3D (referred to as Factor-C3D) [61] also improves the baseline C3D [51] on UCF-101. The accuracy of Factor-C3D is on par with our debiased 3D-ResNet-18.…”
Section: Transfer Learning For Action Classificationmentioning
confidence: 94%
“…Leveraging scene context is useful for object detection [40,7,11,13], semantic segmentation [35,39,40,69], predicting invisible things [32], and action recognition without looking at the human [23,54]. Some work have shown that explicitly factoring human action out of context leads to improved performance in action recognition [71,61]. In contrast to prior work that uses scene contexts to facilitate recognition, our method aims to learn representations that are invariant to scene bias.…”
Section: Related Workmentioning
confidence: 99%
“…However, due to dynamic environment in a video, we cannot use L 2 loss to force the one-to-one mapping [2] between elements of two activated feature maps at t and t + τ . Inspired by [6,7], we propose a novel attention mechanism that does not require pixel-to-pixel correspondence between two input videos or between two feature maps.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…A recent method reduces context bias in video action recognition [36], but it relies on temporal information and thus cannot be applied to the image recognition problems we tackle in this work. A pre-deep learning approach [18] reduces the correlation (bias) between visual attributes by leveraging additional knowledge in the form of semantic groupings of attributes.…”
Section: Related Workmentioning
confidence: 99%