This paper presents a co-salient object detection method to find common salient regions in a set of images. We utilize deep saliency networks to transfer co-saliency prior knowledge and better capture high-level semantic information. The resulting initial co-saliency maps are enhanced by seed propagation steps over an integrated graph. The deep saliency networks are trained in a supervised manner to avoid weakly supervised online learning and exploit them not only to extract high-level features but also to produce both intra- and inter-image saliency maps. Through a refinement step, the initial co-saliency maps can uniformly highlight co-salient regions and locate accurate object boundaries. To handle input image groups inconsistent in size, we propose to pool multi-regional descriptors including both within-segment and within-group information. In addition, the integrated multilayer graph is constructed to find the regions that the previous steps may not detect by seed propagation with low-level descriptors. In this paper, we utilize the useful complementary components of high- and low-level information and several learning-based steps. Our experiments have demonstrated that the proposed approach outperforms comparable co-saliency detection methods on widely used public databases and can also be directly applied to co-segmentation tasks.
This paper presents a video summarization method that is specifically for the static summary of consumer videos. Considering that the consumer videos usually have unclear shot boundaries and many low-quality or meaningless frames, we propose a two-step approach where the first step skims a video and the second step performs content-aware clustering with keyframe selection. Specifically, the first step removes most of redundant frames that contain only little new information by employing the spectral clustering method with color histogram features. As a result, we obtain a condensed video that is shorter and has clearer temporal boundaries than the original. In the second step, we perform rough temporal segmentation and then apply refined clustering for each of the temporal segments, where each frame is represented by the sparse coding of SIFT features. The keyframe selection from each cluster is based on the measure of representativeness and visual quality of frames, where the representativeness is defined from the sparse coding and the visual quality is the combination of contrast, blur, and image skew measures. The problem of keyframe selection is to find the frames that have both representativeness and high quality, which is formulated as an optimization problem. Experiments on videos with various lengths show that the resulting summaries closely follow the important contents of videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.