SUMMARYWe humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation. key words : human visual attention, computational model, saliency, bottom-up, top-down
MotivationDeveloping sophisticated algorithms for detecting and recognizing something like objects from a given image and video has been a long distance challenge in pattern recognition and computer vision research fields. In fact, a huge number of studies, techniques and theories related to object detection and recognition have already been developed. In particular, several methods for detecting certain specific categories of objects such as human bodies and human faces have already been put to practical use in for example surveillance, authentication and the human-centric enhancement of image quality, with the best possible use of the prior knowledge of target objects (human bodies and faces) [1], [2]. However, generic object detection and recognition without any constraints as regards the target objects has remained major challenge, because (1) various kinds of objects might constitute the targets and (2) target objects in the same category might have different appearances due to variations of instances in a specific category, illumination changes and so on. † † The author is with the Graduate School of Informatics, Kyoto University, Kyoto-shi, 606-8501 Japan.† † † The author is with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan.a) E-mail: akisato@ieee.org b) E-mail: yonetani@vision.kuee.kyoto-u.ac.jp c) E-mail: hirayama@is.nagoya-u.ac.jp DOI: 10.1587/transinf.E96.D.562 On the other hand, human beings seem to be able to detect various kinds of objects without any thought or effort. For example, from Fig. 1 left, we can easily and instantly detect a red car, a blue traffic sign and a broad white line. Visual attention [3] is considered to play an important role in achieving this function. Visual attention is one of the built-in mechanisms of the human visual system that quickly selects regions in a visual scene, which are most likely to contain items of interest. Such a pre-selection mechanism focusing only on relevant data would be essential in enabling computers to undertake subsequent processing such as generic o...