We speak to express ourselves. Sometimes words can capture what we mean; sometimes we mean more than can be said. This is where our visible gestures those dynamic oscillations of our gaze, face, head, hand, arms and bodies-help. Not only do these co verbal visual signals help express our intentions, attitudes and emotion, they also help us engage with our conversational partners to get our message across. Understanding how and when a message is supplemented, shaped and changed by auditory and visual signals is crucial for a science ultimately interesting in the correct interpretation of transmitted meaning. This special issue highlights research articles that explore co verbal and nonverbal signals, a key topic in speech communication since these are crucial ingredients in the interpretation of meaning. That is, the meaning of speech is calibrated, augmented and even changed by co verbal/speech behaviours and gestures including the talker's facial expression, eye contact, gaze direction, arm movements, hand gestures, body motion and orientation, posture, proximity, physical contact, and so on. Understanding expressive signals is a vital step for developing machines that can properly decipher intention and engage as social agents. The special issue is divided into three parts: Auditory visual speech perception; Characterization and perception of auditory visual prosody; Computer generated auditory visual speech. Below, we introduce these papers with a brief review of relevant issues and previous studies, when needed.