No matter how well hidden our systems are and how well they do their magic unnoticed in the background, there are times when direct interaction between system and human is a necessity. As long as the interaction can take place unobtrusively and without techno-clutter, this is desirable. It is hard to picture a means of interaction less obtrusive and techno-cluttered than spoken communication on human terms. Spoken face-to-face communication is the most intuitive and robust form of communication between humans imaginable. In order to exploit such human spoken communication to its full potential as an interface between human and machine, we need a much better understanding of how the more human-like aspects of spoken communication work.A crucial aspect of face-to-face conversation is what people do and what they take into consideration in order to manage the flow of the interaction. For example, participants in a conversation have to be able to identify places where it is legitimate to begin to talk, as well as to avoid interrupting their interlocutors. The ability to indicate that you want to say something, that somebody else may start talking, or that a dialog partner should refrain from doing so is of equal importance. We call this interaction control.Examples of the features that play a part in interaction control include the production and perception of auditory cues such as intonation patterns, pauses, voice quality, and various disfluencies; visual cues such as gaze, nods, facial expressions, gestures, and visible articulatory movements; and content cues like pragmatic and semantic (in)completeness. People generally seem to use these cues in combination, and to mix them or shift between them seamlessly. By equipping spoken dialog systems with more human-like interaction control abilities, we aim to move interaction between system and human toward the intuitive and robust communication among humans.The bulk of work on interaction control in CHIL has been focused on auditory prosodic cues, but visual cues have also been explored, and especially through the use of embodied conversational agents (ECAs) -human-like representations of a system, for example, animated talking heads that are able to interact with a user in a natural way using speech, gesture, and facial expression. ECAs are one way of leveraging