Typically, a salient object detection (SOD) model faces opposite requirements in processing object interiors and boundaries. The features of interiors should be invariant to strong appearance change so as to pop-out the salient object as a whole, while the features of boundaries should be selective to slight appearance change to distinguish salient objects and background. To address this selectivityinvariance dilemma, we propose a novel boundary-aware network with successive dilation for image-based SOD. In this network, the feature selectivity at boundaries is enhanced by incorporating a boundary localization stream, while the feature invariance at interiors is guaranteed with a complex interior perception stream. Moreover, a transition compensation stream is adopted to amend the probable failures in transitional regions between interiors and boundaries. In particular, an integrated successive dilation module is proposed to enhance the feature invariance at interiors and transitional regions. Extensive experiments on six datasets show that the proposed approach outperforms 16 state-of-the-art methods.
Abstract-Image-based salient object detection (SOD) has been extensively studied in the past decades. However, videobased SOD is much less explored since there lack large-scale video datasets within which salient objects are unambiguously defined and annotated. Toward this end, this paper proposes a video-based SOD dataset that consists of 200 videos (64 minutes). In constructing the dataset, we manually annotate all objects and regions over 7,650 uniformly sampled keyframes and collect the eye-tracking data of 23 subjects that freeview all videos. From the user data, we find salient objects in video can be defined as objects that consistently pop-out throughout the video, and objects with such attributes can be unambiguously annotated by combining manually annotated object/region masks with eye-tracking data of multiple subjects. To the best of our knowledge, it is currently the largest dataset for video-based salient object detection.Based on this dataset, this paper proposes an unsupervised baseline approach for video-based SOD by using saliencyguided stacked autoencoders. In the proposed approach, multiple spatiotemporal saliency cues are first extracted at pixel, superpixel and object levels. With these saliency cues, stacked autoencoders are unsupervisedly constructed which automatically infer a saliency score for each pixel by progressively encoding the high-dimensional saliency cues gathered from the pixel and its spatiotemporal neighbors. Experimental results show that the proposed unsupervised approach outperforms 30 state-of-the-art models on the proposed dataset, including 19 image-based & classic (unsupervised or non-deep learning), 6 image-based & deep learning, and 5 video-based & unsupervised. Moreover, benchmarking results show that the proposed dataset is very challenging and has the potential to boost the development of video-based SOD.
Semantic object segmentation in video is an important step for large-scale multimedia analysis. In many cases, however, semantic objects are only tagged at video-level, making them difficult to be located and segmented. To address this problem, this paper proposes an approach to segment semantic objects in weakly labeled video via object detection. In our approach, a novel video segmentationby-detection framework is proposed, which first incorporates object and region detectors pre-trained on still images to generate a set of detection and segmentation proposals. Based on the noisy proposals, several object tracks are then initialized by solving a joint binary optimization problem with min-cost flow. As such tracks actually provide rough configurations of semantic objects, we thus refine the object segmentation while preserving the spatiotemporal consistency by inferring the shape likelihoods of pixels from the statistical information of tracks. Experimental results on Youtube-Objects dataset and SegTrack v2 dataset demonstrate that our method outperforms state-of-the-arts and shows impressive results.
Recently, we have witnessed the great progress of salient object detection (SOD), which benefits from the effectiveness of various feature aggregation strategies. However, existing methods usually aggregate the low-level features containing details and the high-level features containing semantics over a large span, which introduces noise into the aggregated features and generate inaccurate saliency map. To address this issue, we propose pyramidal feature shrinking network (PFSNet), which aims to aggregate adjacent feature nodes in pairs with layer-by-layer shrinkage, so that the aggregated features fuse effective details and semantics together and discard interference information. Specifically, pyramidal shrinking decoder (PSD) is proposed to aggregate adjacent features hierarchically in an asymptotic manner. Unlike other methods that aggregate features with significantly different information, this method only focuses on adjacent feature nodes in each layer and shrinks them to a final unique feature node. Besides, we propose adjacent fusion module (AFM) to perform mutual spatial enhancement between the adjacent features so as to dynamically weight the features and adaptively fuse the appropriate information. In addition, scale-aware enrichment module (SEM) based on the features extracted from backbone is utilized to obtain rich scale information and generate diverse initial features with dilated convolutions. Extensive quantitative and qualitative experiments demonstrate that the proposed intuitive framework outperforms 14 state-of-the-art approaches on 5 public datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.