2022
DOI: 10.48550/arxiv.2207.11660
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MAR: Masked Autoencoders for Efficient Action Recognition

Abstract: Standard approaches for video action recognition usually operate on the full input videos, which is inefficient due to the widely present spatio-temporal redundancy in videos. Recent progress in masked video modelling, i.e., VideoMAE, has shown the ability of vanilla Vision Transformers (ViT) to complement spatio-temporal contexts given only limited visual contents. Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patche… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 61 publications
0
1
0
Order By: Relevance
“…To speed up the training procedure during the ablation study, we (1) randomly select 25% videos from the training set of Something-Something V2 dataset to pre-train the model, (2) randomly drop 50% patches following Qing et al (2022), specifically, we adopt a random masking strategy to select 50% input patches to reduce computation of self-attention. We provide a comparison across the speed-up and performance decrease due to these techniques in Tab 15.…”
Section: B1 Training Speed-up Versus Performance Decreasementioning
confidence: 99%
“…To speed up the training procedure during the ablation study, we (1) randomly select 25% videos from the training set of Something-Something V2 dataset to pre-train the model, (2) randomly drop 50% patches following Qing et al (2022), specifically, we adopt a random masking strategy to select 50% input patches to reduce computation of self-attention. We provide a comparison across the speed-up and performance decrease due to these techniques in Tab 15.…”
Section: B1 Training Speed-up Versus Performance Decreasementioning
confidence: 99%