For highly visual species like primates, facial and bodily emotion expressions play a crucial role in emotion perception. However, most research focuses on facial expressions, while perception of bodily cues is still poorly understood. Using a novel comparative priming eye-tracking design, we examined whether our close primate relatives, the chimpanzees (Pan troglodytes), and humans infer emotions from bodily cues through subsequent perceptual integration with facial expressions. In Experiment 1, we primed chimpanzees with videos of bodily movements of unfamiliar conspecifics engaged in social activities of opposite valence (play, fear) against neutral control scenes to examine attentional bias towards succeeding congruent or incongruent facial expressions. In Experiment 2, we assessed the same attentional bias in humans yet using stimuli showing unfamiliar humans. In Experiment 3, humans watched the chimpanzee stimuli of Experiment 1, to examine cross-species emotion perception. Chimpanzees exhibited a persistent fear-related attention bias but did not associate bodily with congruent facial cues. By contrast, humans prioritized conspecifics’ congruent facial expressions (matching bodily scenes) over incongruent ones (mismatching). Nevertheless, humans exhibited no congruency effect when viewing chimpanzee stimuli, suggesting difficulty in cross-species emotion perception. These results highlight differences in emotion perception, with humans being greatly affected by fearful and playful bodily cues and chimpanzees strongly biased toward fearful expressions, regardless of the preceding scene. The data advances our knowledge on the evolution of emotion signalling and presence of distinct perceptual patterns in hominids.