2008
DOI: 10.1007/978-3-540-85483-8_18
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Listener Backchannels: A Probabilistic Multimodal Approach

Abstract: During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model (HMM) or Conditional Random Fields (CRF)) can automatically learn from a database of human-to-human interactions to predict… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0
2

Year Published

2009
2009
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 91 publications
(69 citation statements)
references
References 23 publications
0
67
0
2
Order By: Relevance
“…Common findings have been a final falling or rising pitch or final low/high pitch levels as distinctive prosodic features indicating the end of a turn (Ward and Tsukahara, 2000;Koiso et al, 1998). Duration and energy can sometimes play a role, as well as features relating to semantic or syntactic completeness (Ward and Tsukahara, 2000;Cathcart et al, 2003;Morency et al, 2008). The latter tend to denote the completion of a grammatical clause or constituents as indicated through its part-of-speech (POS) sequence.…”
Section: Backchannels and Barge-insmentioning
confidence: 99%
See 1 more Smart Citation
“…Common findings have been a final falling or rising pitch or final low/high pitch levels as distinctive prosodic features indicating the end of a turn (Ward and Tsukahara, 2000;Koiso et al, 1998). Duration and energy can sometimes play a role, as well as features relating to semantic or syntactic completeness (Ward and Tsukahara, 2000;Cathcart et al, 2003;Morency et al, 2008). The latter tend to denote the completion of a grammatical clause or constituents as indicated through its part-of-speech (POS) sequence.…”
Section: Backchannels and Barge-insmentioning
confidence: 99%
“…Previous studies on the triggers of backchannels and barge-ins in human-human conversation have revealed the importance of prosodic features, such as pitch, duration, and energy, and features relating to syntactic and semantic completeness (Koiso et al, 1998;Ward and Tsukahara, 2000;Cathcart et al, 2003;Morency et al, 2008;Gravano and Hirschberg, 2009;Oertel et al, 2012). The latter can refer to the grammatical completeness of constituents, e.g., such as a full NP versus just the determiner.…”
Section: Introductionmentioning
confidence: 99%
“…To solve this problem, a number of models have been proposed for determining the appropriate timing of feedback (ranging from rules-based to complex machine learning approaches, e.g., [25,17]) and for turning different feedback functions into nonverbal as well as vocal and linguistic behaviour [24,22,5]. Less attention has been paid to the question which feedback function to use (exceptions being [12,14,4]), mainly due to the open challenge of understanding unrestricted spoken language in large domains, which would lead agents to give frequent and less informative signals of non-understanding.…”
Section: Communicative Feedback In Human-agent Interactionmentioning
confidence: 99%
“…Researchers in the virtual agents community have noticed the importance of these mechanisms and have started to develop systems that act as 'active listeners', i.e., agents that produce feedback signals in response to user actions [12,14,17,4,5]. In contrast to this, the at least equally important capability of being able to perceive, interpret, and respond to communicative user feedback is effectively non-existent in conversational virtual agents (but see [19] for a first effort).…”
Section: Introductionmentioning
confidence: 99%
“…al [14] use a decision tree to enable a system learn when a silence signals a wish to give turn and Schlangen [15] has successfully used machine learning to categorize prosodic features from a corpus. Morency et al [16] use Hidden Markov Model to learn feature selection for predicting back-channel feedback opportunities. However, by these studies, by and large, ignore the active element in dialogue -the need to test the quality of perceptual categorization by generating realtime behavior based on these, and monitoring the result.…”
Section: Related Workmentioning
confidence: 99%