Crowding refers to the phenomenon of reduced recognition performance for peripherally presented targets that are flanked by similar stimuli. Crowding is known to vary with lateral distances (i.e., effects of target eccentricity and inter-character spacing). In the present experiment, we examined how crowding is affected by the distance of the stimuli in depth for natural viewing, i.e., for binocular observation of a real depth presentation. Superimposing the displays of two orthogonally arranged screens with a halftransparent mirror created real-depth presentation. We measured recognition performance of flanked compared to isolated targets that were presented at fixation depth, or in depths deviating from fixation depth (defocused). For both defocused directions (i.e., in front of and behind fixation depth), a near as well as a far distance from fixation was applied. Participants' task was to fixate a central cross at a constant distance (190 cm), and to indicate the gap position of an isolated or flanked Landolt ring that was presented at an eccentricity of 2°, on, in front of, or behind fixation depth. Results for natural binocular observation revealed increased crowding effects when stimuli were far compared to near from the fixation plane in depth. This resembles the common effect of eccentricity. Under monocular viewing, that is, without disparity information, crowding did not increase with increased depth distance. Thus, the result seemed to be an effect of binocular observation in real depth. This suggests that crowding in natural viewing might serve as a mechanism to stabilize and orient attention efficiently in three-dimensional space.