2018
DOI: 10.48550/arxiv.1804.10021
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Keyframe Detection in Human Action Videos

Xiang Yan,
Syed Zulqarnain Gilani,
Hanlin Qin
et al.

Abstract: Detecting representative frames in videos based on human actions is quite challenging because of the combined factors of human pose in action and the background. This paper addresses this problem and formulates the key frame detection as one of finding the video frames that optimally maximally contribute to differentiating the underlying action category from all other categories. To this end, we introduce a deep two-stream ConvNet for key frame detection in videos that learns to directly predict the location o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…Directing the model to attend to features from the most important frames prevents the model from overfitting to less important frames which may contain irrelevant information. Hard attention-based methods such as [29,37] detect a specific set of video frames that maximally contribute to the final prediction. Rather than selecting a small number of frames to keep for further analysis and discarding the rest, we propose a soft self-attention mechanism which assigns an importance weight to every frame.…”
Section: Temporal Attention Mechanismmentioning
confidence: 99%
“…Directing the model to attend to features from the most important frames prevents the model from overfitting to less important frames which may contain irrelevant information. Hard attention-based methods such as [29,37] detect a specific set of video frames that maximally contribute to the final prediction. Rather than selecting a small number of frames to keep for further analysis and discarding the rest, we propose a soft self-attention mechanism which assigns an importance weight to every frame.…”
Section: Temporal Attention Mechanismmentioning
confidence: 99%
“…Directing the model to attend to features from the most important frames prevents the model from overfitting to less important frames which may contain irrelevant information. Hard attention based methods such as [38,45] detect a specific set of video frames that maximally contribute to the final prediction. Rather than selecting a small number of frames to keep for further analysis and discarding the rest, we propose a soft self attention mechanism which assigns an importance weight to every frame.…”
Section: Temporal Attention Mechanismmentioning
confidence: 99%
“…Paradigm II) Solving the key frame video object detection in two steps, II-A) a temporal model (e.g., attention RNN, 3D/(2+1)D CNN, transformer) [15], [17], [39], [41], [69] is trained to detect the indices of the key frames, II-B) followed by object detection at the recognized key frames. In order to compare U-LanD framework against paradigm II, we consider a semi-automatic approach, where the ground-truth indices of the key frames are suggested by the cardiologist, followed Fig.…”
Section: Evaluationsmentioning
confidence: 99%
“…As a result, the available training video datasets suffer from two limitations: 1) videos are sparsely labelled, i.e., a small portion of frames in each video have ground-truth landmark labels; and 2) the labelled frames are extensively biased towards specific points in time, i.e., only key frames in each training video are labelled. Previous work mainly divides the problem of video object detection on key frames into sub-problems of key frame recognition [13]- [17] and object detection. They propose techniques such as self-supervised learning [18], semi-supervised learning [19], label propagation [20], registration [21], and temporal cycle-consistency [22].…”
Section: Introductionmentioning
confidence: 99%