The spread of insecure online video has been a serious social problem. The video summarization becomes one of key step for automatic filtering the expected video from the Internet. At present, the most existing video summarization methods are based on calculating the image similarity between video frames, so that the key frame can be selected properly. In this article, we introduce a superpixel segmentation based image similarity calculation, and then the metric is applied into video summarization. To identify the video key frames, we introduce superpixel segmentation to cluster the pixels locally by estimating the optical flow displacement field between successive frames, which can extract key frames and reduce video redundancy. On the VSUMM dataset and YouTube dataset, the experimental results demonstrate that the proposed method has clear advantages on both subjectively qualitative analysis and objectively quantitative evaluation comparing with the state of the art methods.
INTRODUCTIONWith the popularization of the Internet, network security is becoming more and more important. With the gradual maturity of video capture technologies, huge numbers of digital video has emerged in each day. As the main multimedia information carrier, the digital video appears in all aspects of our daily life in the form of drama, news, sports, and surveillance. These massive videos enable people to obtain information in richer forms and bring great convenience to life. However, it has a huge impact on network security, which is an important issue related to national security, social stability, inheritance, and development of national culture. It also puts tremendous pressure on video storage, transmission, archiving, and retrieval. Facing vast video data, the video summarization technique enables us extracting the key contents without viewing all the video content. This technique is not only just giving us a quick view ability but also helping us on increasing storage density and ensuring network security. [1][2][3] At present, various algorithms are proposed to eliminate the redundancy of the original video by selecting and combining representative and meaningful portions of the video. The video summary problem was first introduced in 1994 at Carnegie Mellon University. 4 Since then, many researchers have joined in video summarization, which has already been improved significantly. According to the output, the current video summary technology can be roughly divided into two categories: dynamic video summary algorithm 5 and static video summary algorithm. 6 This article focuses on the static video summary based on key frames extraction, which is selecting the set of static key frames by optimizing the diversity or representativeness of the output.