Pulling Actions out of Context: Explicit Separation for Effective Combination

Wang, Yang; Hoai, Minh

doi:10.1109/cvpr.2018.00736

Cited by 30 publications

(14 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results validate that mitigating scene bias can improve generalization to the target action classification datasets. As shown in the first block of Table 1, action-context factorized C3D (referred to as Factor-C3D) [61] also improves the baseline C3D [51] on UCF-101. The accuracy of Factor-C3D is on par with our debiased 3D-ResNet-18.…”

Section: Transfer Learning For Action Classificationmentioning

confidence: 94%

“…Leveraging scene context is useful for object detection [40,7,11,13], semantic segmentation [35,39,40,69], predicting invisible things [32], and action recognition without looking at the human [23,54]. Some work have shown that explicitly factoring human action out of context leads to improved performance in action recognition [71,61]. In contrast to prior work that uses scene contexts to facilitate recognition, our method aims to learn representations that are invariant to scene bias.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Choi,

Gao,

Messou

et al. 2019

Preprint

View full text Add to dashboard Cite

Quiz time! Can you guess what action the (blocked) person is doing in the four videos? Even though we cannot see a human actor, we can easily predict the action by considering where the scene is. Training a CNN model from these examples may lead to a strong bias toward recognizing the scene or the objects present in the video as opposed to paying attention to the actual action the person is doing. In this work, we show that learning video representation with debiasing leads to improved generalization to novel classes and tasks.

show abstract

Section: Transfer Learning For Action Classificationmentioning

confidence: 94%

Section: Related Workmentioning

confidence: 99%

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Choi,

Gao,

Messou

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…However, due to dynamic environment in a video, we cannot use L 2 loss to force the one-to-one mapping [2] between elements of two activated feature maps at t and t + τ . Inspired by [6,7], we propose a novel attention mechanism that does not require pixel-to-pixel correspondence between two input videos or between two feature maps.…”

Section: Knowledge Distillationmentioning

confidence: 99%

Knowledge Distillation for Human Action Anticipation

Trần

Wang

Zhang³

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

Self Cite

View full text Add to dashboard Cite

We consider the task of training a neural network to anticipate human actions in video. This task is challenging given the complexity of video data, the stochastic nature of the future, and the limited amount of annotated training data. In this paper, we propose a novel knowledge distillation framework that uses an action recognition network to supervise the training of an action anticipation network, guiding the latter to attend to the relevant information needed for correctly anticipating the future actions. This framework is possible thanks to a novel loss function to account for positional shifts of semantic concepts in a dynamic video. The knowledge distillation framework is a form of self-supervised learning, and it takes advantage of unlabeled data. Experimental results on JHMDB and EPIC-KITCHENS dataset show the effectiveness of our approach.

show abstract

“…A recent method reduces context bias in video action recognition [36], but it relies on temporal information and thus cannot be applied to the image recognition problems we tackle in this work. A pre-deep learning approach [18] reduces the correlation (bias) between visual attributes by leveraging additional knowledge in the form of semantic groupings of attributes.…”

Section: Related Workmentioning

confidence: 99%

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Singh¹,

Mahajan²,

Grauman³

et al. 2020

Preprint

View full text Add to dashboard Cite

Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context. Our key idea is to decorrelate feature representations of a category from its co-occurring context. We achieve this by learning a feature subspace that explicitly represents categories occurring in the absence of context along side a joint feature subspace that represents both categories and context. Our very simple yet effective method is extensible to two multi-label tasks -object and attribute classification. On 4 challenging datasets, we demonstrate the effectiveness of our method in reducing contextual bias.

show abstract

Pulling Actions out of Context: Explicit Separation for Effective Combination

Cited by 30 publications

References 24 publications

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Knowledge Distillation for Human Action Anticipation

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Contact Info

Product

Resources

About