2020
DOI: 10.48550/arxiv.2005.14310
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning

Abstract: Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer's internal bel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 54 publications
0
4
0
Order By: Relevance
“…02) to obtain the target map, again assuming that there are some features at the bounding box locations that are guiding attention in proportion to their target similarity. Note that, whereas more sophisticated methods have been developed for predicting search fixations ( Yang et al, 2020 ; Zelinsky et al, 2020 a), we thought it best to err on the side of interpretability when selecting a method for obtaining a target map, which is often a problem for more sophisticated deep-learning methods. Our implementation of a target map is a simple bias much like a center bias, only the bias is introduced at the detected target locations.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…02) to obtain the target map, again assuming that there are some features at the bounding box locations that are guiding attention in proportion to their target similarity. Note that, whereas more sophisticated methods have been developed for predicting search fixations ( Yang et al, 2020 ; Zelinsky et al, 2020 a), we thought it best to err on the side of interpretability when selecting a method for obtaining a target map, which is often a problem for more sophisticated deep-learning methods. Our implementation of a target map is a simple bias much like a center bias, only the bias is introduced at the detected target locations.…”
Section: Methodsmentioning
confidence: 99%
“…After filtering out these object categories, which we did by using the corresponding MaskRCNN channels to detect these categories in the images, we were left with 145 images for analysis. Surprisingly few datasets have been developed for visual search behavior, but by far the largest is COCO-Search18 ( Chen, Yang, Ahn, Samaras, Hoai, & Zelinsky, 2021 ; Yang et al, 2020 ). It consists of roughly 300,000 fixations from 10 people searching for each of 18 target-object categories in 6202 images of natural scenes.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations