Background:
With the technological advancement, the quality of life of a human were improved. Also with the
technological advancement large amount of data were produced by human. The data is in the forms of text, images and
videos. Hence there is a need for significant efforts and means of devising methodologies for analyzing and summarizing
them to manage with the space constraints. Video summaries can be generated either by keyframes or by skim/shot. The
keyframe extraction is done based on deep learning based object detection techniques. Various object detection algorithms
have been reviewed for generating and selecting the best possible frames as keyframes. A set of frames were extracted out
of the original video sequence and based on the technique used, one or more frames of the set are decided as a keyframe,
which then becomes the part of the summarized video. The following paper discusses the selection of various keyframe
extraction techniques in detail.
Methods :
The research paper is focused at summary generation for office surveillance videos. The major focus for the
summary generation is based on various keyframe extraction techniques. For the same various training models like
Mobilenet, SSD, and YOLO were used. A comparative analysis of the efficiency for the same showed YOLO giving
better performance as compared to the others. Keyframe selection techniques like sufficient content change, maximum
frame coverage, minimum correlation, curve simplification, and clustering based on human presence in the frame have
been implemented.
Results:
Variable and fixed length video summaries were generated and analyzed for each keyframe selection techniques
for office surveillance videos. The analysis shows that he output video obtained after using the Clustering and the Curve
Simplification approaches is compressed to half the size of the actual video but requires considerably less storage space.
The technique depending on the change of frame content between consecutive frames for keyframe selection produces the
best output for office room scenarios. The technique depending on frame content between consecutive frames for
keyframe selection produces the best output for office surveillance videos.
Conclusion:
In this paper, we discussed the process of generating a synopsis of a video to highlight the important portions
and discard the trivial and redundant parts. First, we have described various object detection algorithms like YOLO and
SSD, used in conjunction with neural networks like MobileNet to obtain the probabilistic score of an object that is present
in the video. These algorithms generate the probability of a person being a part of the image, for every frame in the input
video. The results of object detection are passed to keyframe extraction algorithms to obtain the summarized video. From our comparative analysis for keyframe selection techniques for office videos will help in determining which keyframe
selection technique is preferable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.