The human visual system is foveated: we can see fine spatial details in central vision, whereas resolution is poor in our peripheral visual field, and this loss of resolution follows an approximately logarithmic decrease. Additionally, our brain organizes visual input in polar coordinates. Therefore, the image projection occurring between retina and primary visual cortex can be mathematically described by the log-polar transform. Here, we test and model how this space-variant visual processing affects how we process binocular disparity, a key component of human depth perception. We observe that the fovea preferentially processes disparities at fine spatial scales, whereas the visual periphery is tuned for coarse spatial scales, in line with the naturally occurring distributions of depths and disparities in the real-world. We further show that the visual system integrates disparity information across the visual field, in a near-optimal fashion. We develop a foveated, log-polar model that mimics the processing of depth information in primary visual cortex and that can process disparity directly in the cortical domain representation. This model takes real images as input and recreates the observed topography of human disparity sensitivity. Our findings support the notion that our foveated, binocular visual system has been moulded by the statistics of our visual environment.
We have developed a low-cost, practical gaze-contingent display in which natural images are presented to the observer with dioptric blur and stereoscopic disparity that are dependent on the three-dimensional structure of natural scenes. Our system simulates a distribution of retinal blur and depth similar to that experienced in real-world viewing conditions by emmetropic observers. We implemented the system using light-field photographs taken with a plenoptic camera which supports digital refocusing anywhere in the images. We coupled this capability with an eye-tracking system and stereoscopic rendering. With this display, we examine how the time course of binocular fusion depends on depth cues from blur and stereoscopic disparity in naturalistic images. Our results show that disparity and peripheral blur interact to modify eye-movement behavior and facilitate binocular fusion, and the greatest benefit was gained by observers who struggled most to achieve fusion. Even though plenoptic images do not replicate an individual’s aberrations, the results demonstrate that a naturalistic distribution of depth-dependent blur may improve 3-D virtual reality, and that interruptions of this pattern (e.g., with intraocular lenses) which flatten the distribution of retinal blur may adversely affect binocular fusion.
A computational model for the control of horizontal vergence, based on a population of disparity tuned complex cells, is presented. Since the population is able to extract the disparity map only in a limited range, using the map to drive vergence control means to limit its functionality inside this range. The model directly extracts the disparity-vergence response by combining the outputs of the disparity detectors without explicit calculation of the disparity map. The resulting vergence control yields to stable fixation and has small response time to a wide range of disparities. Experimental simulations with synthetic stimuli in depth validated the approach
The recent release of the Oculus Rift, originally developed for entertainment applications, has re-ignited the interest of researchers and clinicians toward the use of head-mounted-displays (HMDs) in basic behavioral research and physical and psychological rehabilitation. However, careful evaluation of the Oculus Rift is necessary to determine whether it can be effectively used in these novel applications. In this paper, we address two issues concerning the perceptual quality of the Oculus Rift. (i) Is the Oculus able to generate an acceptable degree of immersivity? In particular, is it possible to elicit the sensation of presence via the virtual stimuli rendered by the device? (ii) Does the Virtual Reality experienced through the Oculus Rift induce physical discomfort? To answer these questions, we employed four virtual scenarios in three separate experiments and evaluated performance with objective and subjective outcomes. In Experiment 1 we monitored observers’ heart rate and asked them to rate their Virtual Reality experience via a custom questionnaire. In Experiment 2 we monitored observers’ head movements in reaction to virtual obstacles and asked them to fill out the Simulator Sickness Questionnaire (Kennedy et al., 1993) both before and after experiencing Virtual Reality. In Experiment 3 we compared the Oculus Rift against two other low-cost devices used in immersive Virtual Reality: the Google cardboard and a standard 3DTV monitor. Observers’ heart rate increased during exposure to Virtual Reality, and they subjectively reported the experience to be immersive and realistic. We found a strong relationship between observers’ fear of heights and vertigo experienced during one of the virtual scenarios involving heights, suggesting that observers felt a strong sensation of presence within the virtual worlds. Subjects reacted to virtual obstacles by moving to avoid them, suggesting that the obstacles were perceived as real threats. Observers did not experience simulator sickness when the exposure to virtual reality was short and did not induce excessive amounts of vection. Compared to the other devices the Oculus Rift elicited a greater degree of immersivity. Thus, our investigation suggests that the Oculus Rift HMD is a potentially powerful tool for a wide array of basic research and clinical applications
Motion estimation has been studied extensively in neuroscience in the last two decades. Even though there has been some early interaction between the biological and computer vision communities at a modelling level, comparatively little work has been done on the examination or extension of the biological models in terms of their engineering efficacy on modern optical flow estimation datasets. An essential contribution of this paper is to show how a neural model can be enriched to deal with real sequences. We start from a classical V1-MT feedforward architecture. We model V1 cells by motion energy (based on spatio-temporal filtering), and MT pattern cells (by pooling V1 cell responses). The efficacy of this architecture and its inherent limitations in the case of real videos are not known. To answer this question, we propose a velocity space sampling of MT neurons (using a decoding scheme to obtain the local velocity from their activity) coupled with a multi-scale approach. After this, we explore the performance of our model on the Middlebury dataset. To the best of our knowledge, this is the only neural model in this dataset. The results are promising and suggest several possible improvements, in particular to better deal with discontinuities. Overall, this work provides a baseline for future developments of bio-inspired scalable computer vision algorithms and the code is publicly available to encourage research in this direction
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.