ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747056
|View full text |Cite
|
Sign up to set email alerts
|

Gated Multimodal Fusion with Contrastive Learning for Turn-Taking Prediction in Human-Robot Dialogue

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…While gaze aversion and light indication helped decrease the interruptions, but they still occurred in most of the conversations. Turn-taking behavior can be further improved by using dialogue context (e.g., [75]), speech prosody (e.g., [76]), estimating the user's gaze, gestures, and facial expressions, such as eyebrow movement and mouth opening [77], or a combination of these features, e.g., [78][79][80] (for an in-depth review of turn-taking in HRI and conversational systems, see [81]) to prevent interrupting the user, both for improving user experience and speech recognition.…”
Section: Turn-takingmentioning
confidence: 99%
“…While gaze aversion and light indication helped decrease the interruptions, but they still occurred in most of the conversations. Turn-taking behavior can be further improved by using dialogue context (e.g., [75]), speech prosody (e.g., [76]), estimating the user's gaze, gestures, and facial expressions, such as eyebrow movement and mouth opening [77], or a combination of these features, e.g., [78][79][80] (for an in-depth review of turn-taking in HRI and conversational systems, see [81]) to prevent interrupting the user, both for improving user experience and speech recognition.…”
Section: Turn-takingmentioning
confidence: 99%
“…Fine-grained address entity recognition from calls is an important task in many applications (Wu and Juang, 2022;Yang et al, 2022), such as obtaining delivery addresses from E-commerce assistant or post-sale service (Eligüzel et al, 2020). To be specific, the scenario considered in this paper is the extraction of fine-grained address entities distributed through the multi-turn spoken dialogue contexts.…”
Section: Introductionmentioning
confidence: 99%