Reconstructing three-dimensional (3D) scenes from two-dimensional (2D) retinal images is an ill-posed problem. Despite this, 3D perception of the world based on 2D retinal images is seemingly accurate and precise. The integration of distinct visual cues is essential for robust 3D perception in humans, but it is unclear whether this is true for non-human primates (NHPs). Here, we assessed 3D perception in macaque monkeys using a planar surface orientation discrimination task. Perception was accurate across a wide range of spatial poses (orientations and distances), but precision was highly dependent on the plane's pose. The monkeys achieved robust 3D perception by dynamically reweighting the integration of stereoscopic and perspective cues according to their pose-dependent reliabilities. Errors in performance could be explained by a prior resembling the 3D orientation statistics of natural scenes. We used neural network simulations based on 3D orientation-selective neurons recorded from the same monkeys to assess how neural computation might constrain perception. The perceptual data were consistent with a model in which the responses of two independent neuronal populations representing stereoscopic cues and perspective cues (with perspective signals from the two eyes combined using nonlinear canonical computations) were optimally integrated through linear summation. Perception of combined-cue stimuli was optimal given this architecture. However, an alternative architecture in which stereoscopic cues, left eye perspective cues, and right eye perspective cues were represented by three independent populations yielded two times greater precision than the monkeys. This result suggests that, due to canonical computations, cue integration for 3D perception is optimized but not maximized.
Significance StatementOur eyes sense two-dimensional (2D) projections of the world, like a movie on a screen, but we perceive the world as three-dimensional (3D). Here, we show that non-human primates (NHPs), like humans, achieve more precise 3D vision by perceptually integrating distinct 3D cues. We also present evidence that perception is influenced by 3D natural scene statistics, and that priors over 3D orientation are subjectively encoded. Using simulations, we examine how neural computation can constrain 3D perception and estimate that perception is half as precise as theoretically possible. Our findings suggest that the concurrence of multiple canonical computations simultaneously optimizes and curbs 3D visual perception, and highlight that what constitutes optimal task performance depends on the underlying neural architecture.