Cross-modal mappings of auditory stimuli reveal valuable insights into how humans make sense of sound and music. Whereas researchers have investigated cross-modal mappings of sound features varied in isolation within paradigms such as speeded classification and forced-choice matching tasks, investigations of representations of concurrently varied sound features (e.g., pitch, loudness and tempo) with overt gestures—accounting for the intrinsic link between movement and sound—are scant. To explore the role of bodily gestures in cross-modal mappings of auditory stimuli we asked 64 musically trained and untrained participants to represent pure tones—continually sounding and concurrently varied in pitch, loudness and tempo—with gestures while the sound stimuli were played. We hypothesized musical training to lead to more consistent mappings between pitch and height, loudness and distance/height, and tempo and speed of hand movement and muscular energy. Our results corroborate previously reported pitch vs. height (higher pitch leading to higher elevation in space) and tempo vs. speed (increasing tempo leading to increasing speed of hand movement) associations, but also reveal novel findings pertaining to musical training which influenced consistency of pitch mappings, annulling a commonly observed bias for convex (i.e., rising–falling) pitch contours. Moreover, we reveal effects of interactions between musical parameters on cross-modal mappings (e.g., pitch and loudness on speed of hand movement), highlighting the importance of studying auditory stimuli concurrently varied in different musical parameters. Results are discussed in light of cross-modal cognition, with particular emphasis on studies within (embodied) music cognition. Implications for theoretical refinements and potential clinical applications are provided.