Abstract-This paper introduces a new computational visual-attention model for static and dynamic saliency maps. First, we use the Earth Mover's Distance (EMD) to measure the center-surround difference in the receptive field, instead of using the Difference-ofGaussian filter that is widely used in many previous visual-attention models. Second, we propose to take two steps of biologically inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the L m -norm and then combining the super features using the Winner-Take-All mechanism. Third, we extend the proposed model to construct dynamic saliency maps from videos by using EMD for computing the center-surround difference in the spatiotemporal receptive field. We evaluate the performance of the proposed model on both static image data and video data. Comparison results show that the proposed model outperforms several existing models under a unified evaluation setting.Index Terms-Visual attention, saliency maps, dynamic saliency maps, earth mover's distance (EMD), spatiotemporal receptive field (STRF)