The view put forward here is that visual bodily signals play a core role in human communication and the coordination of minds. Critically, this role goes far beyond referential and propositional meaning. The human communication system that we consider to be the explanandum in the evolution of language thus is not spoken language. It is, instead, a deeply multimodal, multilayered, multifunctional system that developed—and survived—owing to the extraordinary flexibility and adaptability that it endows us with. Beyond their undisputed iconic power, visual bodily signals (manual and head gestures, facial expressions, gaze, torso movements) fundamentally contribute to key pragmatic processes in modern human communication. This contribution becomes particularly evident with a focus that includes non-iconic manual signals, non-manual signals and signal combinations. Such a focus also needs to consider meaning encoded not just via iconic mappings, since kinematic modulations and interaction-bound meaning are additional properties equipping the body with striking pragmatic capacities. Some of these capacities, or its precursors, may have already been present in the last common ancestor we share with the great apes and may qualify as early versions of the components constituting the hypothesized interaction engine.
This article is part of the theme issue ‘Revisiting the human ‘interaction engine’: comparative approaches to social action coordination’.