Visual attention is the ability of the human vision system to detect salient parts of the scene, on which higher vision tasks, such as recognition, can focus. In human vision, it is believed that visual attention is intimately linked to the eye movements and that the fixation points correspond to the location of the salient scene parts. In computer vision, the paradigm of visual attention has been widely investigated and a saliencybased model of visual attention is now available that is commonly accepted and used in the field, despite the fact that its biological grounding has not been fully assessed. This work proposes a new method for quantitatively assessing the plausibility of this model by comparing its performance with human behavior. The basic idea is to compare the map of attention -the saliency map -produced by the computational model with a fixation density map derived from eye movement experiments. This human attention map can be constructed as an integral of single impulses located at the positions of the successive fixation points. The resulting map has the same format as the computer-generated map, and can easily be compared by qualitative and quantitative methods. Some illustrative examples using a set of natural and synthetic color images show the potential of the validation method to assess the plausibility of the attention model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.