A well-known phenomenon of multimodal language is the synchronous coupling of prosodic contours in speech with salient kinematic changes in co-speech hand-gesture motions. Invariably, such coupling has been rendered by psychologists to require a dedicated neural-cognitive mechanism preplanning speech and gesture trajectories. Recently, in a continuous vocalization task, it was found that acoustic peaks unintentionally appear in vocalizations when gesture motions reach peaks in physical impetus, suggesting a biomechanical basis for gesture-speech synchrony (Pouw, Harrison, & Dixon, 2019). However, from this rudimentary study it is still difficult to draw strong conclusions about gesture-speech dynamics in (more) complex speech and the precise biomechanical nature of these effects. Here we assess how the timing of physical impetus of a gesture relates to its effect on acoustic parameters of mono-syllabic consonant-vowel (CV) vocalization(/pa/). Furthermore, we assess how chest-wall kinematics is affected by gesturing, and whether this modulates the effect of gestures on acoustics. In the current exploratory analysis, we analyze a subset (N = 4) of an already collected dataset (N = 36), which serves as the basis for a pre-registration of the confirmatory analyses yet to be completed. Here we provide exploratory evidence that gestures affect acoustics (amplitude envelope and F0) as well as chest-wall kinematics during mono-syllabic vocalizations. These effects are more extreme when a gesture’s peak impetus occurs closer to the center of the vowel vocalization event. If the current findings can be replicated in confirmatory fashion, there is a more compelling case to be made that gesture-speech physics is important facet of multimodal synchrony.