Recently, the claim was put forward that grammar emerges from embodied conduct. This has led to a discussion in multimodal conversation analysis and interactional linguistics whether the routinization of embodied actions can be described in terms of grammar and grammaticalization. While particular items such as exophoric demonstratives and gestures are routinely delivered as multimodal constructions, i.e., as part of grammar, it is debatable whether this also holds for other candidates: e.g., loose couplings of verbal and embodied conduct, locally routinized, or ephemeral gestalts that do not endure beyond the context of their use. My paper contributes to this discussion by proposing a distinction between two kinds of multimodal gestalts: socially sedimented multimodal gestalts (multimodal constructions), and locally assembled, ephemeral multimodal gestalts. To this end, I examine sedimented couplings of demonstratives and embodied practices in instructions, and the change of a locally assembled format over time. The data are in German and come from 12 h of video-recordings of self-defense trainings for young women. In the course of the participants’ interactional history, the multimodal format of the participants’ actions changes. The changes concern formal and functional aspects of the resources used to accomplish those actions, their multimodal orchestration, and the temporality of their delivery. The paper makes four claims: 1. In their primordial use in co-present interaction, demonstratives are coupled with embodied practices and request addressees’ attention to the speaker’s body, i.e., they are tightly and intercorporeally coupled with the embodied conduct of the participants; 2. gesturally used demonstratives are socially sedimented multimodal gestalts, i.e., multimodal constructions; 3. multimodal gestalts may be subject to transformations in the course of multiple repetitions; 4. in my data, the transformations lead to the emergence of a new, reduced format, which, while being locally routinized, is neither grammatical nor grammaticalized.