Numerous studies indicate that in many particular cases, spatial 3D digital environments can offer more effective methods for information and knowledge sharing than 2D graphical user interfaces. This effectiveness arises from the unique ability of 3D spaces to present users with different types of content, such as written documents, web-based content, audio-visual materials, and even interactive 3D models, all at once and in a spatially meaningful arrangement. For instance, in a 3D virtual space, related documents can be placed closer to each other than those focusing on different topics. Similarly, content meant to convey key points can also be designed to appear relatively larger than less prominent documents. Such spatial arrangements help convey semantic relationships implicitly, allowing users to quickly understand the content and form more memorable impressions with less effort. However, it is not fully understood when and how these potential benefits of 3D environments can be fully realized. In this paper, we focus on this question, specifically in the context of desktop 3D environments, aiming to understand the conditions under which they can lead to improved performance and/or less cognitive load compared to 2D digital interfaces. In particular, we present the results of two experiments in which we compared user performance, eye tracking data (including pupil dilation), and a set of qualitative measures during a learning task carried out in a 2D environment and two different 3D virtual reality scenarios. The notable differences between these 3D scenarios -- as well as disparities in the corresponding results -- allowed us to draw two conclusions: first, that 3D spatial environments can indeed lead to the reduction of cognitive load compared to 2D interfaces, and second, such reductions in cognitive load are more effective in 3D spaces that require less (virtual) spatial locomotion, even at the expense of a greater number of camera rotations. Together, these results suggest that the ability of VR to reduce users' cognitive load may have more to do with the spatial qualities of the content arrangement than with the spatial situatedness of the users.