Our understanding of the cognitive and neural underpinnings of language has traditionally been firmly based on spoken Indo-European languages and on language studied as speech or text. However, in face-to-face communication, language is multimodal: speech signals are invariably accompanied by visual information on the face and in manual gestures, and sign languages deploy multiple channels (hands, face and body) in utterance construction. Moreover, the narrow focus on spoken Indo-European languages has entrenched the assumption that language is comprised wholly by an arbitrary system of symbols and rules. However, iconicity (i.e. resemblance between aspects of communicative form and meaning) is also present: speakers use iconic ges- tures when they speak; many non-Indo-European spoken languages exhibit a substantial amount of iconicity in word forms and, finally, iconicity is the norm, rather than the exception in sign languages. This introduction provides the motivation for taking a multimodal approach to the study of language learning, processing and evolution, and discusses the broad implications of shifting our current dominant approaches and assumptions to encompass multimodal expression in both signed and spoken languages