2023
DOI: 10.1109/tmm.2021.3129052
|View full text |Cite
|
Sign up to set email alerts
|

Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 74 publications
0
2
0
Order By: Relevance
“…Unlike image-based saliency object detection that only uses spatial information within a single frame to predict saliency maps, VSOD needs to explore the motion information hidden in video sequences. However, many works [9][10][11][12][13] have not focused on the impact of motion information on saliency detection results. For example, Zhang et al [14] proposed the determination of temporal and spatial misalignment by fusing the temporal alignment feature and spatial feature of adjacent frames.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike image-based saliency object detection that only uses spatial information within a single frame to predict saliency maps, VSOD needs to explore the motion information hidden in video sequences. However, many works [9][10][11][12][13] have not focused on the impact of motion information on saliency detection results. For example, Zhang et al [14] proposed the determination of temporal and spatial misalignment by fusing the temporal alignment feature and spatial feature of adjacent frames.…”
Section: Introductionmentioning
confidence: 99%
“…The video salient object detection (VSOD), also known as zero-shot video segmentation [1], [2], [3], [4], [5], [6], has received extensive research attention in recent years, whose primary objective is to segment video objects that attract the human visual attention most [7], [8], [9]. Different from the widely studied image salient object detection (ISOD) using spatial information only [10], [11], [12], the temporal information provided by the video data makes the saliency detection task more difficult [13], [14], [15], and we give an in-depth discussion regarding this issue to clearly demonstrate our motivation.…”
Section: Introductionmentioning
confidence: 99%