Video-based collaborative virtual environments (CVE) attempt to emulate face-to-face meetings by immersing remote collaborators in a shared 3D virtual setting. To investigate potential advantages of this novel type of collaborative user interfaces for creating a better sense of social presence and affording a more efficient collaborative process we conducted an empirical study in which pairs of users solved a simple task (matching a set of celebrity photos with a set of quotes) using four different media: face-to-face, a standard desktop videoconferencing system (VC), a desktop video-CVE, and a stereo large-screen video-CVE. As expected, results showed that face-to-face provided a significantly stronger sense of social presence than any of the systems, but relatively little differences showed between the systems themselves. However, significant gender effects emerged in an ex-post analysis for the different system types, with females perceiving more social presence when using the standard video conferencing environment and less with the video-CVE conditions, while males showed the opposite effect. Linguistic analysis of audio transcriptions and video analysis further illuminates differences between collaboration styles of males and females across the collaborative conditions. We discuss the implications of our findings for future studies into CVEs and video conferencing systems.