Previously, no-reference (NR) stereoscopic 3D (S3D) image quality assessment (IQA) algorithms have been limited to the extraction of reliable hand-crafted features based on an understanding of the insufficiently revealed human visual system or natural scene statistics. Furthermore, compared with full-reference (FR) S3D IQA metrics, it is difficult to achieve competitive quality score predictions using the extracted features, which are not optimized with respect to human opinion. To cope with this limitation of the conventional approach, we introduce a novel deep learning scheme for NR S3D IQA in terms of local to global feature aggregation. A deep convolutional neural network (CNN) model is trained in a supervised manner through two-step regression. First, to overcome the lack of training data, local patch-based CNNs are modeled, and the FR S3D IQA metric is used to approximate a reference ground-truth for training the CNNs. The automatically extracted local abstractions are aggregated into global features by inserting an aggregation layer in the deep structure. The locally trained model parameters are then updated iteratively using supervised global labeling, i.e., subjective mean opinion score (MOS). In particular, the proposed deep NR S3D image quality evaluator does not estimate the depth from a pair of S3D images. The S3D image quality scores predicted by the proposed method represent a significant improvement over those of previous NR S3D IQA algorithms. Indeed, the accuracy of the proposed method is competitive with FR S3D IQA metrics, having ~ 91% correlation in terms of MOS.
Virtual reality (VR) experiences often elicit a negative effect, cybersickness, which results in nausea, disorientation, and visual discomfort. To quantitatively analyze the degree of cybersickness depending on various attributes of VR content (i.e., camera movement, field of view, path length, frame reference, and controllability), we generated cybersickness reference (CYRE) content with 52 VR scenes that represent different content attributes. A protocol for cybersickness evaluation was designed to collect subjective opinions from 154 participants as reliably as possible in conjunction with objective data such as rendered VR scenes and biological signals. By investigating the data obtained through the experiment, the statistically significant relationships—the degree that the cybersickness varies with each isolated content factor—are separately identified. We showed that the cybersickness severity was highly correlated with six biological features reflecting brain activities (i.e., relative power spectral densities of Fp1 delta, Fp 1 beta, Fp2 delta, Fp2 gamma, T4 delta, and T4 beta waves) with a coefficient of determination greater than 0.9. Moreover, our experimental results show that individual characteristics (age and susceptibility) are also quantitatively associated with cybersickness level. Notably, the constructed dataset contains a number of labels (i.e., subjective cybersickness scores) that correspond to each VR scene. We used these labels to build cybersickness prediction models and obtain a reliable predictive performance. Hence, the proposed dataset is supposed to be widely applicable in general-purpose scenarios regarding cybersickness quantification.
To maximize the presence experienced by humans, visual content has evolved to achieve a higher visual presence in a series of high definition (HD), ultra HD (UHD), 8K UHD, and 8K stereoscopic 3D (S3D). Several studies have introduced visual presence delivered from content when viewing UHD S3D from a content analysis perspective. Nevertheless, no clear definition has been presented for visual presence, and only a subjective evaluation has been relied upon. The main reason for this is that there is a limitation to defining visual presence via the use of content information itself. In this paper, we define the visual presence for each viewing environment, and investigate a novel methodology to measure the experienced visual presence when viewing both 2D and 3D via the definition of a new metric termed volume of visual information by quantifying the influence of the viewing geometry between the display and viewer. To achieve this goal, the viewing geometry and display parameters for both flat and atypical displays are analyzed in terms of human perception by introducing a novel concept of pixel-wise geometry. In addition, perceptual weighting through analysis of content information is performed in accordance with monocular and binocular vision characteristics. In the experimental results, it is shown that the constructed model based on the viewing geometry, content, and perceptual characteristics has a high correlation of about 84% with subjective evaluations.
The human visual system perceives 3D depth following sensing via its binocular optical system, a series of massively parallel processing units, and a feedback system that controls the mechanical dynamics of eye movements and the crystalline lens. The process of accommodation (focusing of the crystalline lens) and binocular vergence is controlled simultaneously and symbiotically via cross-coupled communication between the two critical depth computation modalities. The output responses of these two subsystems, which are induced by oculomotor control, are used in the computation of a clear and stable cyclopean 3D image from the input stimuli. These subsystems operate in smooth synchronicity when one is viewing the natural world; however, conflicting responses can occur when viewing stereoscopic 3D (S3D) content on fixed displays, causing physiological discomfort. If such occurrences could be predicted, then they might also be avoided (by modifying the acquisition process) or ameliorated (by changing the relative scene depth). Toward this end, we have developed a dynamic accommodation and vergence interaction (DAVI) model that successfully predicts visual discomfort on S3D images. The DAVI model is based on the phasic and reflex responses of the fast fusional vergence mechanism. Quantitative models of accommodation and vergence mismatches are used to conduct visual discomfort prediction. Other 3D perceptual elements are included in the proposed method, including sharpness limits imposed by the depth of focus and fusion limits implied by Panum's fusional area. The DAVI predictor is created by training a support vector machine on features derived from the proposed model and on recorded subjective assessment results. The experimental results are shown to produce accurate predictions of experienced visual discomfort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.