2018
DOI: 10.1007/978-3-030-01264-9_37
|View full text |Cite
|
Sign up to set email alerts
|

DeepVS: A Deep Learning Based Video Saliency Prediction Approach

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
145
0
1

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 166 publications
(147 citation statements)
references
References 30 publications
0
145
0
1
Order By: Relevance
“…-Static Unsupervised (SU): Itti [17], LeMeur [27], GBVS [12], SUN [42], Judd [20], Hou [14], RARE2012 [34], BMS [41], -Static Deep learning (SD): Salicon [16], DeepNet [33], ML-Net [6], SalGAN [32], -Dynamic Unsupervised (DU): Fang [8], OBDL [13], -Dynamic Machine learning (DM): PQFT [10], Rudoy [35], -Dynamic Deep learning (DD): DeepVS [19], ACL-Net [40], STSconvNet [1], FGRNE [28].…”
Section: Taxonomymentioning
confidence: 99%
See 1 more Smart Citation
“…-Static Unsupervised (SU): Itti [17], LeMeur [27], GBVS [12], SUN [42], Judd [20], Hou [14], RARE2012 [34], BMS [41], -Static Deep learning (SD): Salicon [16], DeepNet [33], ML-Net [6], SalGAN [32], -Dynamic Unsupervised (DU): Fang [8], OBDL [13], -Dynamic Machine learning (DM): PQFT [10], Rudoy [35], -Dynamic Deep learning (DD): DeepVS [19], ACL-Net [40], STSconvNet [1], FGRNE [28].…”
Section: Taxonomymentioning
confidence: 99%
“…Let us stress that only very few works address the temporal dimension in traditional and UAV videos. Methods that tackle the temporal dimension comprise hand-crafted motion features [10,35], network architecture fed with optical flow [1], possibly in a two-layer fashion [1,7], or Long Short-Term Memory (LSTM) architectures [2,19,40,28] to benefit from their memory functionality.…”
Section: Introductionmentioning
confidence: 99%
“…The work in [7] uses a 3D CNN to extract features, plus an LSTM network to expand the temporal span of the analysis. Other re-searchers use further additional modules, such as the attention mechanism [75] or object-to-motion sub-network [29].…”
Section: Saliency Predictionmentioning
confidence: 99%
“…For the most part, this issue is addressed inconsistently. The majority of the data sets either make no explicit mention of separating smooth pursuit from fixations (ASCMN [51], SFU [24], two Hollywood2-based sets [45,71], DHF1K [75]) or rely on the event detection built into the eye tracker, which in turn does not differentiate SP from fixations (TUD [4], USC CRCNS [13], CITIUS [39]), LEDOV [29]. IRCCyN/IVC (Video 1) [9] does not mention any eye movement types at all, while IRCCyN/IVC (Video 2) [18] only names SP in passing.…”
Section: Video Saliency Data Setsmentioning
confidence: 99%
“…Long Short-term Memory (LSTM) networks have also been used for tracking visual saliency both in static images [12] and video stimuli [63]. In order to improve saliency estimation in videos, many approaches employ multi-stream networks, such as RGB/Optical Flow (OF) [3], RGB/OF/Depth [40], or multiple subnets such as objectness/motion [31] or saliency/gaze [22] pathways. Action Recognition: The work of [32] explored several approaches for fusing information over temporal dimension, while in [30] 3D spatio-temporal convolutions have been proposed, whose performance can be boosted when trained on large datasets [55,57] or employing ResNet architectures [25].…”
Section: Related Workmentioning
confidence: 99%