2021
DOI: 10.1007/s00521-021-06322-x
|View full text |Cite
|
Sign up to set email alerts
|

A novel keyframe extraction method for video classification using deep neural networks

Abstract: Combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(10 citation statements)
references
References 34 publications
0
10
0
Order By: Relevance
“…This model first processes each video frame through Inception, saving the output from the network's final layer. It then converts this into an extracted feature set for training in the RNN model, which utilizes LSTM layers [17], [18], [19], [20], [21]. Other studies classify videos by selecting the best frames for the training process [21], [22], [23], [24].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This model first processes each video frame through Inception, saving the output from the network's final layer. It then converts this into an extracted feature set for training in the RNN model, which utilizes LSTM layers [17], [18], [19], [20], [21]. Other studies classify videos by selecting the best frames for the training process [21], [22], [23], [24].…”
Section: Related Workmentioning
confidence: 99%
“…It then converts this into an extracted feature set for training in the RNN model, which utilizes LSTM layers [17], [18], [19], [20], [21]. Other studies classify videos by selecting the best frames for the training process [21], [22], [23], [24]. In the paper [9], the frame extraction method identifies areas that provide information for each frame and selects frames based on the similarities between those areas.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning proved its importance in image classification, and incorporating the pre-trained model and applying augmentation . [6] In this paper, a template-based keyframe extraction method is proposed which employs action template-based similarity to extract keyframes for video classification tasks.Combining pre-trained CNN with ConvLSTM has achieved the highest classification accuracy among the other architectures.One of the limitations is that CNN architecture used did not produce the best results .Also it requires a machine with more than one GPU .So future work could be focused the on application of the proposed algorithm using more powerful architectures for real-world video classification.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Since annotation costs for video-based applications are highly dependent on training data volumes, there is a need to streamline the process of key-frame identification, and to curate sequences with appropriate metadata. Additionally, the recent work in [5] used self-supervised deep-learning representations to isolate key frames from targeted human action-specific tasks only. Motivated by these existing works, we implement a novel two-stage end-to-end process to first automatically classify each frame from video sequences for frame content, uncertainty and lighting conditions for multiple targets across wide-fields of view videos.…”
Section: Introductionmentioning
confidence: 99%