This article investigates bilingual versus monolingual construal of manner of motion in speech and gesture across three languages—Mandarin, Japanese, and English—argued to be typologically distinct in speech and co‐speech gesture (Brown & Chen, 2013; McNeill, 2001; Slobin, 2004b; Talmy, 1991). Narrative descriptions of motion were elicited in the L1 and L2 from bilingual Mandarin–English (n = 12) and Japanese–English (n = 15) speakers at an intermediate, CEFR–B level of L2 proficiency, and from monolingual speakers of Mandarin (n = 14), Japanese (n = 16), and English (n = 13). Results revealed that encoding of manner in L2 speech is characterized by universal features of development, while construal of manner in gesture is characterized by bidirectional interactions between properties of the source and target languages involved, yielding a convergence between the L1 and L2, specifically in the use of manner‐highlighting gestures. The study supports growing evidence of the complex inter‐relationships between the L1 and L2, the need for a reconceptualization of what constitutes target‐like performance in the L2, and the complementary use of gesture analysis, which may provide a wider lens through which the relationships between languages in the bilingual mind may be observed.