In visual saliency estimation, one of the most challenging tasks is to distinguish targets and distractors that share certain visual attributes. With the observation that such targets and distractors can sometimes be easily separated when projected to specific subspaces, we propose to estimate image saliency by learning a set of discriminative subspaces that perform the best in popping out targets and suppressing distractors. Toward this end, we first conduct principal component analysis on massive randomly selected image patches. The principal components, which correspond to the largest eigenvalues, are selected to construct candidate subspaces since they often demonstrate impressive abilities to separate targets and distractors. By projecting images onto various subspaces, we further characterize each image patch by its contrasts against randomly selected neighboring and peripheral regions. In this manner, the probable targets often have the highest responses, while the responses at background regions become very low. Based on such random contrasts, an optimization framework with pairwise binary terms is adopted to learn the saliency model that best separates salient targets and distractors by optimally integrating the cues from various subspaces. Experimental results on two public benchmarks show that the proposed approach outperforms 16 state-of-the-art methods in human fixation prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.