In investigating the influence of body movement in multimodal perception, human motion displays are frequently used as a means of visual standardization and control of external confounders. However, no principle is established regarding the selection of an adequate display for specific study purposes. The aim of this study was to evaluate the effects of adopting 4 visual displays (point-light, stick figure, body mass, skeleton) on the observers’ perception of music performances in 2 expressive conditions (immobile, projected expressiveness). Two hundred eleven participants rated 8 audio-visual samples in expressiveness, match between movement and music, and overall evaluation. The results revealed significant isolated main effects of visual display and expressive condition on the observers’ ratings (in both, p < 0.001), and interaction effects between the two factors (p < 0.001). Displays closer to a human form (mostly skeleton, sometimes body mass) exponentiated the evaluations of expressiveness and music-movement match in the projected expressiveness condition, and of overall evaluation in the immobile condition; the opposite trend occurred with the simplified motion display (stick figure). Projected expressiveness performances were higher rated than immobile performances. Although the expressive conditions remained distinguishable across displays, the more complex ones potentiated the attribution of subjective qualities. We underline the importance of considering the variable display as an influencing factor in perceptual studies.