2023
DOI: 10.48550/arxiv.2301.02217
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

Abstract: Recent advances in egocentric video understanding models are promising, but their heavy computational expense is a barrier for many real-world applications. To address this challenge, we propose EgoDistill, a distillationbased approach that learns to reconstruct heavy egocentric video clip features by combining the semantics from a sparse set of video frames with the head motion from lightweight IMU readings. We further devise a novel selfsupervised training strategy for IMU feature learning. Our method leads … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 55 publications
0
3
0
Order By: Relevance
“…There is a wide range of research on egocentric videos, covering topics such as human-object interactions [23], activity recognition [24][25][26], anticipation [27], video summarization [28,29], hand detection [30], parsing social interactions [31], and inferring the camera wearer's body pose [32]. Most of these works aim to evaluate behaviors over extended temporal durations.…”
Section: Egocentric Video Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…There is a wide range of research on egocentric videos, covering topics such as human-object interactions [23], activity recognition [24][25][26], anticipation [27], video summarization [28,29], hand detection [30], parsing social interactions [31], and inferring the camera wearer's body pose [32]. Most of these works aim to evaluate behaviors over extended temporal durations.…”
Section: Egocentric Video Researchmentioning
confidence: 99%
“…For video images with unknown camera parameters, we crop with default parameters. Some studies [24] focusing on egocentric videos have already shown the effectiveness of camera movement for action recognition. And in Figure 4, we show the different types of correlation between camera movement and salient movement.…”
Section: Camera Movement Modulementioning
confidence: 99%
“…Multimodal (egocentric) video understanding. In the context of (egocentric) video understanding, several works have shown that using additional modalities at inference time significantly improves performance [25,29,33,36,43,50,56,61]. The hypothesis is intuitive -certain actions are more easily understood from specific modalities, e.g.…”
Section: Related Workmentioning
confidence: 99%