2020
DOI: 10.1007/978-3-030-58558-7_25
|View full text |Cite
|
Sign up to set email alerts
|

Unified Image and Video Saliency Modeling

Abstract: Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. On the one hand, image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing. For video saliency prediction on the other hand, rapid gains have been achieved on the recent DHF1K benchmark through network architectures that are optimized for this task. Here, we take a step back and ask: Can image and video saliency modeling be approached via… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
97
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 129 publications
(98 citation statements)
references
References 48 publications
1
97
0
Order By: Relevance
“…While unsupervised domain adaptation has been applied to image classification (Ganin 2016;Tzeng et al 2017), face recognition (Kan et al 2015), object detection (Tang 2016), semantic segmentation (Zhang et al 2020) and video action recognition (Li et al 2018) (among others), our work is, to our knowledge, the first to deal with unsupervised domain adaptation on video saliency prediction. It is worthwhile to note that this is technically and fundamentally different from the form of domain adaptation proposed in UNISAL (Droste et al 2020), that, instead, learns domain-specific parameters. This means that, at inference time, UNISAL requires to know the source dataset of a given input in order to select domain-specific learned parameters.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…While unsupervised domain adaptation has been applied to image classification (Ganin 2016;Tzeng et al 2017), face recognition (Kan et al 2015), object detection (Tang 2016), semantic segmentation (Zhang et al 2020) and video action recognition (Li et al 2018) (among others), our work is, to our knowledge, the first to deal with unsupervised domain adaptation on video saliency prediction. It is worthwhile to note that this is technically and fundamentally different from the form of domain adaptation proposed in UNISAL (Droste et al 2020), that, instead, learns domain-specific parameters. This means that, at inference time, UNISAL requires to know the source dataset of a given input in order to select domain-specific learned parameters.…”
Section: Related Workmentioning
confidence: 99%
“…It is also different from unsupervised salient object detection (Zhang et al 2018), which, instead, attempts to predict saliency by exploiting large unlabelled or weakly-labelled samples. However, we also provide HD 2 S with domain-specific learning capabilities as in (Droste et al 2020), showing how this mechanism improves performance but cannot be applied in unsupervised domain adaptation scenarios.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Fourthly, by employing an element-wise multiplication between S m V,C and S is further processed with the 1 × 1 convolution and resize-convolution operation to generate a high-level semantic-aware attention map F m semantic . 2) Center-Bias Prior: According to previous studies [28], [66], [67], human attention tends to concentrate on the center of scenes, which is termed as center-bias phenomenon. To this end, a learnable center-bias prior function is adopted according to our preceding work [40].…”
Section: Multi-cues Integrationmentioning
confidence: 99%
“…Salience in dynamic scenes is related to but conceptually different from salience in static images [ 27 ]. Specific methods for the dynamic case have been studied [ 28 , 29 , 30 , 31 , 32 , 33 ] and, very recently, unified image-video approaches [ 34 ] proposed, but only in the context of spatial salience. For gaze prediction, temporal features are found to be of key importance in rare events, so spatial static features can explain gaze in most cases [ 35 ].…”
Section: Related Workmentioning
confidence: 99%