Image quality assessment of immersive content and more specifically 360-degree one is still in its infancy. There are many challenges regarding sphere vs. projected representation, human visual system (HVS) properties in a 360-degree environment, etc. In this paper, we propose the use of CNNs to design a no reference model to predict visual quality of 360-degree images. Instead of feeding the CNN with ERPs, visually important viewports are extracted based on visual scan-path prediction and given to a multi-channel CNN using DenseNet-121. Moreover, information about visual fixations and just noticeable difference are used to account for the HVS properties and make the network closer to human judgment. The scan-path is also used to create multiple instances of the database so as to perform a robust generalization analysis and compensate for the lack of databases.
360-degree Image quality assessment (IQA) is facing the major challenge of lack of ground-truth databases. This problem is accentuated for deep learning based approaches where the performances are as good as the available data. In this context, only two databases are used to train and
validate deep learning-based IQA models. To compensate this lack, a dataaugmentation technique is investigated in this paper. We use visual scan-path to increase the learning examples from existing training data. Multiple scan-paths are predicted to account for the diversity of human observers.
These scan-paths are then used to select viewports from the spherical representation. The results of the data-augmentation training scheme showed an improvement over not using it. We also try to answer the question of using the MOS obtained for the 360-degree image as the quality anchor for
the whole set of extracted viewports in comparison to 2D blind quality metrics. The comparison showed the superiority of using the MOS when adopting a patch-based learning.
The use of convolutional neural networks (CNN) for image quality assessment (IQA) becomes many researcher's focus. Various pretrained models are fine-tuned and used for this task. In this paper, we conduct a benchmark study of seven state-of-the-art pre-trained models for IQA of omnidirectional images. To this end, we first train these models using an omnidirectional database and compare their performance with the pre-trained versions. Then, we compare the use of viewports versus equirectangular (ERP) images as inputs to the models. Finally, for the viewports-based models, we explore the impact of the input number of viewports on the models' performance. Experimental results demonstrated the performance gain of the re-trained CNNs compared to their pre-trained versions. Also, the viewports-based approach outperformed the ERP-based one independently of the number of selected views.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.