Incremental Multimodal Feedback for Conversational Agents

Kopp, Stefan; Stocksmeier, Thorsten; Gibbon, Dafydd

doi:10.1007/978-3-540-74997-4_13

Cited by 19 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other approaches take into account also a semantic analysis of what the speaker is saying. When coupling with a model of the agents mental state, these models ensure that the agent displays coherent and appropriate backchannel signals [151].…”

Section: Synthesis Of Social Actionsmentioning

confidence: 99%

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

Vinciarelli

Pantić

Heylen

et al. 2012

IEEE Trans. Affective Comput.

325

233

View full text Add to dashboard Cite

Abstract-SocialSignal Processing is the research domain aimed at bridging the social intelligence gap between humans and machines. This article is the first survey of the domain that jointly considers its three major aspects, namely modeling, analysis and synthesis of social behaviour. Modeling investigates laws and principles underlying social interaction, analysis explores approaches for automatic understanding of social exchanges recorded with different sensors, and synthesis studies techniques for the generation of social behaviour via various forms of embodiment. For each of the above aspects, the paper includes an extensive survey of the literature, points to the most important publicly available resources, and outlines the most fundamental challenges ahead.

show abstract

Section: Synthesis Of Social Actionsmentioning

confidence: 99%

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

Vinciarelli

Pantić

Heylen

et al. 2012

IEEE Trans. Affective Comput.

325

233

View full text Add to dashboard Cite

show abstract

“…Most research has focused on individual behaviors such as rapidly synthesizing the gestures and facial expressions that co-occur with speech [5,25,22,35] or real-time recognition the speech and gesture of a human speaker [30,8]. But as these techniques have matured, virtual human research has increasingly focused on dyadic factors such as the feedback a listener provides in the midst of the other participants speech [16,23]. These include recognizing and generating backchannel or jump-in points [39] turn-taking and floor control signals, postural mimicry [14] and emotional feedback [19,1].…”

mentioning

confidence: 99%

A probabilistic multimodal approach for predicting listener backchannels

Morency

Kok

Gratch

2009

Auton Agent Multi-Agent Syst

123

View full text Add to dashboard Cite

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.Keywords Listener backchannel feedback · Nonverbal behavior prediction · Sequential probabilistic model · Conditional random field · Head nod · Multimodal

show abstract

“…Studies on the social interaction of human-computer interfaces have included conversations with robots [11][12][13] and virtual agents [14][15][16][17]. An important cue to recognize social interaction is nonverbal information.…”

Section: Introductionmentioning

confidence: 99%

“…These studies employ some multimodal information such as hand gestures, head nods, face direction, and gaze direction as well as spoken language to build teamwork in the collaboration with a robot [11,12] or to create a chance to address the user [13]. Meanwhile, Maatman et al [14] and Kopp et al [15] studied the natural behavior of the agent while the user is speaking. These virtual agents need to generate nonverbal outputs to the conversation partner, and these outputs directly affect the naturalness of the dialog.…”

Section: Introductionmentioning

confidence: 99%

Estimating a User's Internal State before the First Input Utterance

Chiba

Ito

2012

Advances in Human-Computer Interaction

View full text Add to dashboard Cite

This paper describes a method for estimating the internal state of a user of a spoken dialog system before his/her first input utterance. When actually using a dialog-based system, the user is often perplexed by the prompt. A typical system provides more detailed information to a user who is taking time to make an input utterance, but such assistance is nuisance if the user is merely considering how to answer the prompt. To respond appropriately, the spoken dialog system should be able to consider the user's internal state before the user's input. Conventional studies on user modeling have focused on the linguistic information of the utterance for estimating the user's internal state, but this approach cannot estimate the user's state until the end of the user's first utterance. Therefore, we focused on the user's nonverbal output such as fillers, silence, or head-moving until the beginning of the input utterance. The experimental data was collected on a Wizard of Oz basis, and the labels were decided by five evaluators. Finally, we conducted a discrimination experiment with the trained user model using combined features. As a three-class discrimination result, we obtained about 85% accuracy in an open test.

show abstract

Incremental Multimodal Feedback for Conversational Agents

Cited by 19 publications

References 12 publications

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

A probabilistic multimodal approach for predicting listener backchannels

Estimating a User's Internal State before the First Input Utterance

Contact Info

Product

Resources

About