Exploring Prosody in Interaction Control

Edlund, Jens; Heldner, Mattias

doi:10.1159/000090099

Cited by 47 publications

(53 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Swedish has two basic intonation patterns, medial fall (H*L%) and fall-rise (H*LH%) (Bruce, 1977). Thus, in an analysis of the prosodic aspects of turn-taking in Swedish, Edlund & Heldner (2005) makes a distinction between patterns with a final rise and a final fall. This analysis shows that a rising intonation was followed by an equal distribution of speaker 4 changes and speaker holds (51% and 49% respectively), implying that the turn-taking effects of a rising intonation in Swedish are unclear.…”

Section: Turn-taking Cuesmentioning

confidence: 99%

The additive effect of turn-taking cues in human and synthetic voice

Hjalmarsson¹

2011

Speech Communication

View full text Add to dashboard Cite

Please cite this article as: Hjalmarsson, A., The additive effect of turn-taking cues in human and synthetic voice, Speech Communication (2010), doi: 10.1016/j.specom. 2010.08.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. AbstractA previous line of research suggests that interlocutors identify appropriate places to speak by cues in the behaviour of the preceding speaker. If used in combination, these cues have an additive effect on listeners' turn-taking attempts. The present study further explores these findings by examining the effect of such turn-taking cues experimentally. The objective is to investigate the possibilities of generating turn-taking cues with a synthetic voice. Thus, in addition to stimuli realized with a human voice, the experiment included dialogues where one of the speakers is replaced with a synthesis. The turn-taking cues investigated include intonation, phrase-final lengthening, semantic completeness, stereotyped lexical expressions and non-lexical speech production phenomena such as lexical repetitions, breathing and lipsmacks. The results show that the turn-taking cues realized with a synthetic voice affect the judgements similar to the corresponding human version and there is no difference in reaction times between these two conditions. Furthermore, the results support Duncan's findings: the more turn-taking cues with the same pragmatic function, turn-yielding or turn-holding, the higher the agreement among subjects on the expected outcome. In addition, the number of turn-taking cues affects the reaction times for these decisions. Thus, the more cues, the faster the reaction time.

show abstract

Section: Turn-taking Cuesmentioning

confidence: 99%

The additive effect of turn-taking cues in human and synthetic voice

Hjalmarsson¹

2011

Speech Communication

View full text Add to dashboard Cite

show abstract

“…However, spontaneous conversational speech frequently contains silent pauses inside what we would intuitively group into turns, complete utterances, or sentence-like units, and inside what are indeed semantically coherent units, and dialog systems using silence-based endpoint detection run into problems with unfinished utterances when encountering spontaneous speech [2]. Experiments presented in [11] showed that dialog systems relying on silence-based segmentation run the risk of interrupting their users in as much as 35% of all silent pauses in the kind of speech investigated.…”

Section: Silence Duration Thresholdsmentioning

confidence: 99%

“…In [11] it was shown that the number of incorrect turn-taking decisions can be reduced substantially by combining standard silence-based endpoint detection with an automatic classification of intonation patterns. In the process, it is also possible to decrease the length of the required silence without any loss in performance.…”

Section: /Nailon/ -Online Automatic Extraction Of Prosodic Cuesmentioning

confidence: 99%

Multimodal Interaction Control

Beskow

Carlson

Edlund

et al. 2009

Computers in the Human Interaction Loop

View full text Add to dashboard Cite

No matter how well hidden our systems are and how well they do their magic unnoticed in the background, there are times when direct interaction between system and human is a necessity. As long as the interaction can take place unobtrusively and without techno-clutter, this is desirable. It is hard to picture a means of interaction less obtrusive and techno-cluttered than spoken communication on human terms. Spoken face-to-face communication is the most intuitive and robust form of communication between humans imaginable. In order to exploit such human spoken communication to its full potential as an interface between human and machine, we need a much better understanding of how the more human-like aspects of spoken communication work.A crucial aspect of face-to-face conversation is what people do and what they take into consideration in order to manage the flow of the interaction. For example, participants in a conversation have to be able to identify places where it is legitimate to begin to talk, as well as to avoid interrupting their interlocutors. The ability to indicate that you want to say something, that somebody else may start talking, or that a dialog partner should refrain from doing so is of equal importance. We call this interaction control.Examples of the features that play a part in interaction control include the production and perception of auditory cues such as intonation patterns, pauses, voice quality, and various disfluencies; visual cues such as gaze, nods, facial expressions, gestures, and visible articulatory movements; and content cues like pragmatic and semantic (in)completeness. People generally seem to use these cues in combination, and to mix them or shift between them seamlessly. By equipping spoken dialog systems with more human-like interaction control abilities, we aim to move interaction between system and human toward the intuitive and robust communication among humans.The bulk of work on interaction control in CHIL has been focused on auditory prosodic cues, but visual cues have also been explored, and especially through the use of embodied conversational agents (ECAs) -human-like representations of a system, for example, animated talking heads that are able to interact with a user in a natural way using speech, gesture, and facial expression. ECAs are one way of leveraging

show abstract

“…Methods involving both syntactic and semantic completeness (e.g. [6]) as well as prosody [10] have been shown to improve the situation, and we are currently investigating their use in HIGGINS.…”

Section: Interaction Controlmentioning

confidence: 99%

Talking with Higgins: Research Challenges in a Spoken Dialogue System

Skantze¹,

Edlund²,

Carlson³

2006

Perception and Interactive Technologies

View full text Add to dashboard Cite

Abstract. This paper presents the current status of the research in the Higgins project and provides background for a demonstration of the spoken dialogue system implemented within the project. The project represents the latest development in the ongoing dialogue systems research at KTH. The practical goal of the project is to build collaborative conversational dialogue systems in which research issues such as error handling techniques can be tested empirically.

show abstract

Exploring Prosody in Interaction Control

Cited by 47 publications

References 15 publications

The additive effect of turn-taking cues in human and synthetic voice

The additive effect of turn-taking cues in human and synthetic voice

Multimodal Interaction Control

Talking with Higgins: Research Challenges in a Spoken Dialogue System

Contact Info

Product

Resources

About