Abstract.A salient image region is defined as an image part that is clearly different from its surround. This difference is measured in terms of a number of attributes, namely, contrast, brightness and orientation. By measuring these attributes, visual saliency algorithms aim to predict the regions in an image that would attract our attention under free viewing conditions. As the number of saliency models has increased significantly in the past two decades, one is faced with the challenge of finding a metric that can be used to objectively quantify the performance of different saliency algorithms. To address this issue in this article, first, the state of the art of saliency models is revisited. Second, the major challenges associated with the evaluation of saliency models are discussed. Third, ten frequently used evaluation metrics are examined and their results are discussed for ten latest state-of-the-art saliency models.
INTRODUCTIONOur visual system is selective, i.e., we concentrate on certain aspects of a scene while neglecting other things. This is evident from studies on change blindness, 1-3 which show that large changes can be made in a visual scene that can remain unnoticed. The reason why our visual system is selective is because our brains do not process all the visual information in a scene. In fact, while the optic nerve receives information at the rate of approximately 3 × 10 6 bits/s, the brain processes less than 10 4 bits/s of this information. 4 In other words, the brain uses a tiny fraction (<1%) of the collected information to build a representation of the scene, a representation that is good enough to perform a number of complex activities in the environment such as walking, aiming at objects and detecting objects. Based on this, we can ask what mechanisms are responsible for building this representation of the scene.In the literature, two main attention mechanisms are discussed: top-down and bottom-up. 5-11 Top-down is voluntary, goal-driven and slow, i.e., typically in the range between 100 ms and several seconds. 9 It is assumed that the top-down attention is closely linked with cognitive aspects such as memory, thought and reasoning. For example, by employing top-down mechanisms, we can attend to a