This paper presents a video summarization method that is specifically for the static summary of consumer videos. Considering that the consumer videos usually have unclear shot boundaries and many low-quality or meaningless frames, we propose a two-step approach where the first step skims a video and the second step performs content-aware clustering with keyframe selection. Specifically, the first step removes most of redundant frames that contain only little new information by employing the spectral clustering method with color histogram features. As a result, we obtain a condensed video that is shorter and has clearer temporal boundaries than the original. In the second step, we perform rough temporal segmentation and then apply refined clustering for each of the temporal segments, where each frame is represented by the sparse coding of SIFT features. The keyframe selection from each cluster is based on the measure of representativeness and visual quality of frames, where the representativeness is defined from the sparse coding and the visual quality is the combination of contrast, blur, and image skew measures. The problem of keyframe selection is to find the frames that have both representativeness and high quality, which is formulated as an optimization problem. Experiments on videos with various lengths show that the resulting summaries closely follow the important contents of videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.