A novel keyframe extraction method for video classification using deep neural networks

Kızıltepe, Rukiye Savran; Gan, John Q.; Escobar, Juan José

doi:10.1007/s00521-021-06322-x

Cited by 31 publications

(10 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model first processes each video frame through Inception, saving the output from the network's final layer. It then converts this into an extracted feature set for training in the RNN model, which utilizes LSTM layers [17], [18], [19], [20], [21]. Other studies classify videos by selecting the best frames for the training process [21], [22], [23], [24].…”

Section: Related Workmentioning

confidence: 99%

“…It then converts this into an extracted feature set for training in the RNN model, which utilizes LSTM layers [17], [18], [19], [20], [21]. Other studies classify videos by selecting the best frames for the training process [21], [22], [23], [24]. In the paper [9], the frame extraction method identifies areas that provide information for each frame and selects frames based on the similarities between those areas.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Modified Deep Pattern Classifier on Indonesian Traditional Dance Spatio-Temporal Data

Mulyanto,

Yuniarno,

Hafidz

et al. 2023

EMITTER Int'l J. of Engin. Technol.

View full text Add to dashboard Cite

Traditional dances, like those of Indonesia, have complex and unique patterns requiring accurate cultural preservation and documentation classification. However, traditional dance classification methods often rely on manual analysis and subjective judgment, which leads to inconsistencies and limitations. This research explores a modified deep pattern classifier of traditional dance movements in videos, including Gambyong, Remo, and Topeng, using a Convolutional Neural Network (CNN). Evaluation model's performance using a testing spatio-temporal dataset in Indonesian traditional dance videos is performed. The videos are processed through frame-level segmentation, enabling the CNN to capture nuances in posture, footwork, and facial expressions exhibited by dancers. Then, the obtained confusion matrix enables the calculation of performance metrics such as accuracy, precision, sensitivity, and F1-score. The results showcase a high accuracy of 97.5%, indicating the reliable classification of the dataset. Furthermore, future research directions are suggested, including investigating advanced CNN architectures, incorporating temporal information through recurrent neural networks, exploring transfer learning techniques, and integrating user feedback for iterative refinement of the model. The proposed method has the potential to advance dance analysis and find applications in dance education, choreography, and cultural preservation.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Modified Deep Pattern Classifier on Indonesian Traditional Dance Spatio-Temporal Data

Mulyanto,

Yuniarno,

Hafidz

et al. 2023

EMITTER Int'l J. of Engin. Technol.

View full text Add to dashboard Cite

show abstract

“…Deep learning proved its importance in image classification, and incorporating the pre-trained model and applying augmentation . [6] In this paper, a template-based keyframe extraction method is proposed which employs action template-based similarity to extract keyframes for video classification tasks.Combining pre-trained CNN with ConvLSTM has achieved the highest classification accuracy among the other architectures.One of the limitations is that CNN architecture used did not produce the best results .Also it requires a machine with more than one GPU .So future work could be focused the on application of the proposed algorithm using more powerful architectures for real-world video classification.…”

Section: Literature Reviewmentioning

confidence: 99%

Ship Image Classification Using Deep Learning Method

Patil

Mohanasundaram

Patil

2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Video content classification is an important research content in computer vision, which is widely used in many fields, such as image and video retrieval, computer vision. This paper presents a model that is a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) which develops, trains, and optimizes a deep learning network that can identify the type of video content and classify them into categories such as "Animation, Gaming, natural content, flat content, etc". To enhance the performance of the model novel keyframe extraction method is included to classify only the keyframes, thereby reducing the overall processing time without sacrificing any significant performance.

show abstract

“…Since annotation costs for video-based applications are highly dependent on training data volumes, there is a need to streamline the process of key-frame identification, and to curate sequences with appropriate metadata. Additionally, the recent work in [5] used self-supervised deep-learning representations to isolate key frames from targeted human action-specific tasks only. Motivated by these existing works, we implement a novel two-stage end-to-end process to first automatically classify each frame from video sequences for frame content, uncertainty and lighting conditions for multiple targets across wide-fields of view videos.…”

Section: Introductionmentioning

confidence: 99%

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

Roychowdhury¹

2022

Preprint

View full text Add to dashboard Cite

Automating video-based data and machine learning pipelines poses several challenges including metadata generation for efficient storage and retrieval and isolation of key-frames for scene understanding tasks. In this work, we present two semi-supervised approaches that automate this process of manual frame sifting in video streams by automatically classifying scenes for content and filtering frames for fine-tuning scene understanding tasks. The first rule-based method starts from a pre-trained object detector and it assigns scene type, uncertainty and lighting categories to each frame based on probability distributions of foreground objects. Next, frames with the highest uncertainty and structural dissimilarity are isolated as key-frames. The second method relies on the simCLR model for frame encoding followed by label-spreading from 20% of frame samples to label the remaining frames for scene and lighting categories. Also, clustering the video frames in the encoded feature space further isolates key-frames at cluster boundaries. The proposed methods achieve 64-93% accuracy for automated scene categorization for outdoor image videos from public domain datasets of JAAD and KITTI. Also, less than 10% of all input frames can be filtered as key-frames that can then be sent for annotation and fine tuning of machine vision algorithms. Thus, the proposed framework can be scaled to additional video data streams for automated training of perception-driven systems with minimal training images.

show abstract

A novel keyframe extraction method for video classification using deep neural networks

Cited by 31 publications

References 34 publications

Modified Deep Pattern Classifier on Indonesian Traditional Dance Spatio-Temporal Data

Modified Deep Pattern Classifier on Indonesian Traditional Dance Spatio-Temporal Data

Ship Image Classification Using Deep Learning Method

Semi-supervised and Deep learning Frameworks for Video Classification and Key-frame Identification

Contact Info

Product

Resources

About