That human social interaction involves the intertwined cooperation of di¤erent modalities is uncontroversial. Researchers in several allied fields have, however, only recently begun to document the precise ways in which talk, gesture, gaze, and aspects of the material surround are brought together to form coherent courses of action. 1 The papers in this volume are attempts to develop this line of inquiry. Although the authors draw on a range of analytic, theoretical, and methodological traditions (conversation analysis, ethnography, distributed cognition, and workplace studies), all are concerned to explore and illuminate the inherently multimodal character of social interaction. Recent studies, including those collected in this volume, suggest that di¤erent modalities work together not only to elaborate the semantic content of talk but also to constitute coherent courses of action. In this introduction we present evidence for this position. We begin by reviewing some select literature focusing primarily on communicative functions and interactive organizations of specific modalities before turning to consider the integration of distinct modalities in interaction.
Semiotic modalitiesAs conversation analysts, we begin by observing that social interaction is most 'at home' in face-to-face interaction. This is as Scheglo¤ (2000a: 1) puts it a 'species-distinctive embodiment of the primordial site of sociality.' Levinson notes along similar lines that . . . conversation is clearly the prototypical kind of language usage, the form in which we are all first exposed to language -the matrix for language acquisition. Various aspects of pragmatic organization can be shown to be centrally organized around usage in conversation. [For instance, the] unmarked usages of grammatical encodings of temporal, spatial, social discourse parameters are organized Face-to-face interaction is, by definition, multimodal interaction in which participants encounter a steady stream of meaningful facial expressions, gestures, body postures, head movements, words, grammatical constructions, and prosodic contours. In this chapter we follow Enfield (2005) in distinguishing between the vocal/aural and visuospatial modalities. The vocal-aural modality encompasses spoken language including prosody. The visuospatial modality includes gesture, gaze, and body postures. As Enfield notes, these di¤er not only in terms of modality but also with respect to which semiotic ground plays a dominant role in their organization. Vocal-aural signs are prototypically symbolic whereas indexicality and iconicity are more important in the visuospatial modality. 2 We want to point out that by looking at interaction from a multimodal perspective we do not mean to privilege one modality over another (e.g., visuospatial over vocal/aural) but rather to suggest that much can be gained from examining a turn-at-talk for where it is situated vocally (e.g., sequentially, prosodically, syntactically) as well as visuospatially (e.g., body orientation, facial expression, accompa...