Proceedings of the 2009 International Conference on Multimodal Interfaces 2009
DOI: 10.1145/1647314.1647332
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal end-of-turn prediction in multi-party meetings

Abstract: One of many skills required to engage properly in a conversation is to know the appropiate use of the rules of engagement. In order to engage properly in a conversation, a virtual human or robot should, for instance, be able to know when it is being addressed or when the speaker is about to hand over the turn. The paper presents a multimodal approach to end-of-speaker-turn prediction using sequential probabilistic models (Conditional Random Fields) to learn a model from observations of real-life multi-party me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(25 citation statements)
references
References 27 publications
0
25
0
Order By: Relevance
“…Several studies have utilized the automatic detection model to determine whether turn-changing takes place in multi-party conversation by using speech processing techniques [14,[21][22][23][24][25] and nonverbal behaviors such as eye-gaze behavior [14,21,22] and physical visual motion from image processing [21,22,26] near the end of a current speaker's utterance. However, these studies only estimate if the current speaker will continue speaking.…”
Section: Prediction Of Next Speaker and Utterance Intervalmentioning
confidence: 99%
“…Several studies have utilized the automatic detection model to determine whether turn-changing takes place in multi-party conversation by using speech processing techniques [14,[21][22][23][24][25] and nonverbal behaviors such as eye-gaze behavior [14,21,22] and physical visual motion from image processing [21,22,26] near the end of a current speaker's utterance. However, these studies only estimate if the current speaker will continue speaking.…”
Section: Prediction Of Next Speaker and Utterance Intervalmentioning
confidence: 99%
“…Results of runnning two-sided Wilcoxon signed rank among models (2)~(4) and among (5)~(7) are shown. Results for three pairs of two conditions under (2) vs (5), (3) vs (6), and (4) vs (7) are shown. * stands for p-value < 0.05, while * * stands for p-value ≪ 0.001.…”
Section: Resultsmentioning
confidence: 99%
“…With such knowledge, many studies have developed models for predicting actual turn-changing, i.e., whether turn-changing or turn-keeping will take place, on the basis of acoustic features [3, 6, 10, 12, 18, 26, 34, 36ś38, 43, 47, 50], linguistic features [34,37,38,43], and visual features, such as overall physical motion [3,6,8,43] near the end of a speaker's utterances or during multiple utterances. Moreover, some research has focused on detailed non-verbal behaviors such as eye-gaze behavior [3,6,18,20,24,26], head movement [18,21,22], mouth movement [23], and respiration [20,25]. However, many turn-changing prediction studies use mainly features extracted from speakers.…”
Section: Related Work 21 Turn-changing Prediction Technologymentioning
confidence: 99%
“…We find that there are significant performance benefits to modeling linguistic features at a slower temporal rate, and in a separate sub-network from acoustic features. We also find that our approach can be used to incorporate gaze features into turn-taking models, a task that has been previously found to be difficult [4].…”
Section: Introductionmentioning
confidence: 81%