A Multimodal Model for Predicting Conversational Feedbacks

Boudin, Auriane; Bertrand, Roxane; Rauzy, Stéphane; Ochs, Magalie; Blache, Philippe

doi:10.1007/978-3-030-83527-9_46

Cited by 15 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Visual features. We included features that have been used in previous work on BC such as head movement (nodding and shaking) [2,5,21,28], gaze [17,21,38], eyebrow movements (raising and frowning) [21,22], and facial expressions (smiling and laughing) [2,21]. These features were annotated manually and were available with the dataset (for details, see [3]).…”

Section: Modelsmentioning

confidence: 99%

“…We selected a subset of the eGeMAPS that were used in several previous studies. We included pitch (variation) [5,21,22,27,28,40], Mel-Frequency Cepstral Coefficients (MFCC) [17,21,28,34], voice quality [17,21,28,34], energy [18,31,34] and pausal information [5,6,21,32]. To minimize identify-confounding [29] the features were centered and scaled for each participant.…”

Section: Modelsmentioning

confidence: 99%

“…First, we use Part-Of-Speech (POS) as they could indicate, e.g., whether an utterance has ended or whether the speaker is just making a pause. This feature is often associated with transitions between discourse units and may elicit communicative responses [5,6,38]. We extracted POS tags using SpaCy toolkit [20], resulting in 17 tags for our French dataset.…”

Section: Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Predicting Backchannel Signaling in Child-Caregiver Multimodal Conversations

Liu¹,

Nikolaus²,

Bodur³

et al. 2022

Preprint

View full text Add to dashboard Cite

Conversation requires cooperative social interaction between interlocutors. In particular, active listening through backchannel signaling (hereafter BC) i.e., showing attention through verbal (short responses like ``Yeah'') and non-verbal behaviors (e.g. smiling or nodding) is crucial to managing the flow of a conversation and it requires sophisticated coordination skills. How does BC develop in childhood?Previous studies were either conducted in highly controlled experimental settings or relied on qualitative corpus analysis, which does not allow for a proper understanding of children's BC development, especially in terms of its collaborative/coordinated use. This paper aims at filling this gap using a machine learning model that learns to predict children's BC production based on the interlocutor's inviting cues in child-caregiver naturalistic conversations. By comparing BC predictability across children and adults, we found that, contrary to what has been suggested in previous in-lab studies, children between the ages of 6 and 12 can actually produce and respond to backchannel inviting cues as consistently as adults do, suggesting an adult-like form of coordination.

show abstract

Section: Modelsmentioning

confidence: 99%

Section: Modelsmentioning

confidence: 99%

Section: Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Predicting Backchannel Signaling in Child-Caregiver Multimodal Conversations

Liu¹,

Nikolaus²,

Bodur³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Most common multimodal settings combined landmarks, body/head pose, or visual cues with past utterance transcriptions (Chu et al, 2018;Hua et al, 2019;Ueno et al, 2020;), acoustic features (Türker et al, 2018Ahuja et al, 2019;Ueno et al, 2020;Goswami et al, 2020;Woo et al, 2021;Jain and Leekha, 2021;Murray et al, 2021;Ben-Youssef et al, 2021), speaker's metadata (Raman et al, 2021;, or with combinations of the previous modalities (Ishii et al, 2020;Huang et al, 2020;Blache et al, 2020;Ishii et al, 2021;Boudin et al, 2021). The most common way to exploit different modalities together consists in simply concatenating their embedded representations.…”

Section: Input Modalitiesmentioning

confidence: 99%

“…More recently, the collection, annotation and release of bigger datasets favored the appearance of data-driven automated multimodal methods for backchannel prediction. For example, Boudin et al (2021) used a logistic classifier that was trained on visual cues, prosodic and lexico-syntactic features in order to predict not only the backchannel opportunity but also their subtype associated (generic, positive, or expected). The choice of such a simple classifier was driven by the small dataset available.…”

Section: High-levelmentioning

confidence: 99%

Didn't see that coming: a survey on non-verbal social human behavior forecasting

Barquero¹,

Núñez²,

Escalera³

et al. 2022

Preprint

View full text Add to dashboard Cite

Non-verbal social human behavior forecasting has increasingly attracted the interest of the research community in recent years. Its direct applications to human-robot interaction and socially-aware human motion generation make it a very attractive field. In this survey, we define the behavior forecasting problem for multiple interactive agents in a generic way that aims at unifying the fields of social signals prediction and human motion forecasting, traditionally separated. We hold that both problem formulations refer to the same conceptual problem, and identify many shared fundamental challenges: future stochasticity, context awareness, history exploitation, etc. We also propose a taxonomy that comprises methods published in the last 5 years in a very informative way and describes the current main concerns of the community with regard to this problem. In order to promote further research on this field, we also provide a summarized and friendly overview of audiovisual datasets featuring non-acted social interactions. Finally, we describe the most common metrics used in this task and their particular issues.

show abstract

Linguistic Markers of Subtle Cognitive Impairment in Connected Speech: A Systematic Review

Richard,

Lelandais,

Reilly

et al. 2024

J Speech Lang Hear Res

View full text Add to dashboard Cite

Purpose: This systematic review covers the current stage of research on subtle cognitive impairment with connected speech. It aims at surveying the linguistic features in use to single out those that can best identify patients with mild neurocognitive disorders (mNCDs), whose cognitive changes remain underdiagnosed. Method: We followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines and proposed a full definition of features for the analysis of speech features. Fifty-one studies met the inclusion criteria. Most of them focused on age-related progressive diseases and included fewer than 30 subjects. Results: A total of 384 features labeled with 335 different names was retrieved, yielding various results in discriminating individuals with mNCDs from controls. Conclusions: This finding highlights the need for harmonized labels to further investigate mNCDs with linguistic markers. We suggest two different ways of assessing a feature's reliability. We also point out potential methodological issues that remain to be resolved, along with recommendations for reproducible research in the field.

show abstract

A Multimodal Model for Predicting Conversational Feedbacks

Cited by 15 publications

References 24 publications

Predicting Backchannel Signaling in Child-Caregiver Multimodal Conversations

Predicting Backchannel Signaling in Child-Caregiver Multimodal Conversations

Didn't see that coming: a survey on non-verbal social human behavior forecasting

Linguistic Markers of Subtle Cognitive Impairment in Connected Speech: A Systematic Review

Contact Info

Product

Resources

About