DOI: 10.1007/978-3-540-85483-8_17
|View full text |Cite
|
Sign up to set email alerts
|

Learning Smooth, Human-Like Turntaking in Realtime Dialogue

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 26 publications
(23 citation statements)
references
References 11 publications
0
23
0
Order By: Relevance
“…Our speaking agent, Askur, performs this task by learning to appropriately adjust its silence tolerance during the dialogue (See Figure 1). The architectural framework is described in more detail in [8] and [18]; a quick review of this work will aid in understanding what follows. The architecture, which is in continuous development, currently consists of 35 interacting modules in a publish-subscribe message passing framework.…”
Section: System Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Our speaking agent, Askur, performs this task by learning to appropriately adjust its silence tolerance during the dialogue (See Figure 1). The architectural framework is described in more detail in [8] and [18]; a quick review of this work will aid in understanding what follows. The architecture, which is in continuous development, currently consists of 35 interacting modules in a publish-subscribe message passing framework.…”
Section: System Architecturementioning
confidence: 99%
“…In Jonsdottir and Thórisson (2008) [8] we described the first version of the system and presented data on its learning ability when interacting with another artificial agent (a non-learning copy of itself), listening for features of the prosody of the Loquendo speech synthesizer to determine its turntaking predictions and behavior. The results, while promising, described interaction sessions between the system and a single synthesized voice, with negligible noise in the audio channel.…”
Section: Introductionmentioning
confidence: 99%
“…However, this means relying on (often inaccurate) speech recognition and difficult natural language understanding. Therefore, usually only prosodic information is used with end of turn detection, for example by Jonsdottir et al (2008). Unfortunately, very few papers about virtual agents with speech input explain in detail how end of turn detection was implemented.…”
Section: Turn-takingmentioning
confidence: 99%
“…They all wanted turn-taking to be as smooth as possible, with as little overlap as possible and the silences between turns as short as possible. Another clear example of this is the work of Jonsdottir et al (2008), in which the authors used machine learning techniques to have an agent learn 'proper' turn-taking behaviour. They did this by creating a classifier that uses the prosody of the other person's speech to determine how long to wait before taking the turn.…”
Section: Optimal Turn-takingmentioning
confidence: 99%
“…Jonsdottir et al [130] contribute a talking agent that can learn fluent turntaking behavior during an interaction. Using online machine learning with prosodic features as input, the agent learns to time the start of its turn in such a way that silence between turns is minimized.…”
Section: End-of-turn Predictionmentioning
confidence: 99%