Existing deep architectures for visual saliency prediction face problems like inefficient feature encoding, larger inference times, and a huge number of model parameters. One possible solution is to make the local and global contextual feature extraction computationally less intensive by a novel lighter architecture. In this work, we propose an end-to-end learnable, inter-scale information sharing residual block based architecture for saliency prediction. A series of these blocks are used for efficient multi-scale feature extraction followed by a dilated inception module (DIM) and a novel decoder. We name this network as cross-concatenated multi-scale residual (CMR) block based network, CMRNet. We comprehensively evaluate our architecture on three datasets: SALICON, MIT1003, and MIT300. Experimental results show that our model works at par with other state-of-the-art models. Especially, our model outperforms all the other models with a smaller inference time and a lesser number of model parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.