2016
DOI: 10.1016/j.patcog.2016.03.007
|View full text |Cite
|
Sign up to set email alerts
|

Perceptual modeling in the problem of active object recognition in visual scenes

Abstract: Incorporating models of human perception into the process of scene interpretation and object recognition in visual content is a strong trend in computer vision. In this paper we tackle the modeling of visual perception via automatic visual saliency maps for object recognition. Visual saliency represents an efficient way to drive the scene analysis towards particular areas considered 'of interest' for a viewer and an efficient alternative to computationally intensive sliding window methods for object recognitio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 41 publications
0
9
0
Order By: Relevance
“…• The second is to predict where people look, in order to address traditional image and video applications, such as object [118,119] and action [120,121] recognition, video summarization [122], patient diagnosis [123] or image quality assessment [124], in broader and more complex scenarios, while providing efficient solutions and better performances. In line with the second objective, our contributions in Chapters 5 and 6 pursue to facilitate the task of a CCTV operator in a video surveillance scenario, by means of a deep architecture that models attention in the temporal domain.…”
Section: Applicationsmentioning
confidence: 99%
See 2 more Smart Citations
“…• The second is to predict where people look, in order to address traditional image and video applications, such as object [118,119] and action [120,121] recognition, video summarization [122], patient diagnosis [123] or image quality assessment [124], in broader and more complex scenarios, while providing efficient solutions and better performances. In line with the second objective, our contributions in Chapters 5 and 6 pursue to facilitate the task of a CCTV operator in a video surveillance scenario, by means of a deep architecture that models attention in the temporal domain.…”
Section: Applicationsmentioning
confidence: 99%
“…Since we are on a highly unbalanced scenario, in which the areas that attract visual attention are strongly less prominent than those that inhibit it, we need to prevent the later dominating the learning process, which might lead to a poor performance. For that end, we have used the Non-Uniform Sampling (NUS) strategy proposed in [118], which allows to generate training datasets that balance the number of attracting and non-attracting points. While the first are selected based on the GT masks computed from human fixations for a given video frame, non-attracting points are sampled from those spatial locations which have not been fixated by viewers in any frame of the same video.…”
Section: Learning Sub-tasks For Spatio-temporal Visual Attention Estimationmentioning
confidence: 99%
See 1 more Smart Citation
“…The main benefits are the reduction of the search space by providing a small set of high quality locations, thus allowing the use of more expensive and powerful recognition techniques, and the ability to naturally localize objects without a fixed aspect ratio. Object proposals algorithms are aligned with the object-level saliency detection paradigm [41,42] in modeling a selective process to guide the recognition analysis towards particular regions of interest in the image.…”
Section: Related Workmentioning
confidence: 99%
“…There are methods that, learning from sequences of fixations of subjects observing images, derive bounding boxes that can be used to train object detectors [37,38]. Closely related to our scenario, some works drive active object detection in egocentric videos using gaze fixations [39] or automatically predicted saliency of pixels [40].…”
Section: Weakly Supervised Active Object Recognitionmentioning
confidence: 99%