Inferring emotions from others’ non-verbal behavior is a pervasive and fundamental task in social interactions. Typically, real-life encounters imply the co-location of interactants, i.e., their embodiment within a shared spatial-temporal continuum in which the trajectories of the interaction partner’s Expressive Body Movement (EBM) create mutual social affordances. Shared Virtual Environments (SVEs) and Virtual Characters (VCs) are increasingly used to study social perception, allowing to reconcile experimental stimulus control with ecological validity. However, it remains unclear whether display modalities that enable co-presence have an impact on observers responses to VCs’ expressive behaviors. Drawing upon ecological approaches to social perception, we reasoned that sharing the space with a VC should amplify affordances as compared to a screen display, and consequently alter observers’ perceptions of EBM in terms of judgment certainty, hit rates, perceived expressive qualities (arousal and valence), and resulting approach and avoidance tendencies. In a between-subject design, we compared the perception of 54 10-s animations of VCs performing three daily activities (painting, mopping, sanding) in three emotional states (angry, happy, sad)—either displayed in 3D as a co-located VC moving in shared space, or as a 2D replay on a screen that was also placed in the SVEs. Results confirm the effective experimental control of the variable of interest, showing that perceived co-presence was significantly affected by the display modality, while perceived realism and immersion showed no difference. Spatial presence and social presence showed marginal effects. Results suggest that the display modality had a minimal effect on emotion perception. A weak effect was found for the expression “happy,” for which unbiased hit rates were higher in the 3D condition. Importantly, low hit rates were observed for all three emotion categories. However, observers judgments significantly correlated for category assignment and across all rating dimensions, indicating universal decoding principles. While category assignment was erroneous, though, ratings of valence and arousal were consistent with expectations derived from emotion theory. The study demonstrates the value of animated VCs in emotion perception studies and raises new questions regarding the validity of category-based emotion recognition measures.