In this paper we examine how participants’ multimodal conduct maps onto one of the basic organizational principles of social interaction: preference organization – and how it does so in a similar manner across five different languages (Czech, French, Hebrew, Mandarin, and Romanian). Based on interactional data from these languages, we identify a recurrent multimodal practice that respondents deploy in turn-initial position in dispreferred responses to various first actions, such as information requests, assessments, proposals, and informing. The practice involves the verbal delivery of a turn-initial expression corresponding to English ‘I don’t know’ and its variants (‘dunno’) coupled with gaze aversion from the prior speaker. We show that through this ‘multimodal assembly’ respondents preface a dispreferred response within various sequence types, and we demonstrate the cross-linguistic robustness of this practice: Through the focal multimodal assembly, respondents retrospectively mark the prior action as problematic and prospectively alert co-participants to incipient resistance to the constraints set out or to the stance conveyed by that action. By evidencing how grammar and body interface in related ways across a diverse set of languages, the findings open a window onto cross-linguistic, cross-modal, and cross-cultural consistencies in human interactional conduct.