In this work, we focus on label efficient learning for video
action detection. We develop a novel semi-supervised active
learning approach which utilizes both labeled as well as un-
labeled data along with informative sample selection for ac-
tion detection. Video action detection requires spatio-temporal
localization along with classification, which poses several
challenges for both active learning (informative sample se-
lection) as well as semi-supervised learning (pseudo label
generation). First, we propose NoiseAug, a simple augmenta-
tion strategy which effectively selects informative samples for
video action detection. Next, we propose fft-attention, a novel
technique based on high-pass filtering which enables effective
utilization of pseudo label for SSL in video action detection
by emphasizing on relevant activity region within a video.
We evaluate the proposed approach on three different bench-
mark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS.
First, we demonstrate its effectiveness on video action detec-
tion where the proposed approach outperforms prior works in
semi-supervised and weakly-supervised learning along with
several baseline approaches in both UCF101-24 and JHMDB-
21. Next, we also show its effectiveness on Youtube-VOS for
video object segmentation demonstrating its generalization
capability for other dense prediction tasks in videos.