Proceedings of the 9th International Conference on Multimodal Interfaces 2007
DOI: 10.1145/1322192.1322254
|View full text |Cite
|
Sign up to set email alerts
|

On-line multi-modal speaker diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 8 publications
0
17
0
Order By: Relevance
“…Though the results seem comparable to the state-of-the-art, the solution requires specialized hardware. The work presented in [106] integrates audiovisual features for on-line audiovisual speaker diarization using a dynamic Bayesian network (DBN) but tests were limited to discussions with two to three people on two short test scenarios. Another use of DBN, also called factorial HMMs [107], is proposed in [108] as an audiovisual framework.…”
Section: Overlap Detectionmentioning
confidence: 99%
“…Though the results seem comparable to the state-of-the-art, the solution requires specialized hardware. The work presented in [106] integrates audiovisual features for on-line audiovisual speaker diarization using a dynamic Bayesian network (DBN) but tests were limited to discussions with two to three people on two short test scenarios. Another use of DBN, also called factorial HMMs [107], is proposed in [108] as an audiovisual framework.…”
Section: Overlap Detectionmentioning
confidence: 99%
“…However, it is vulnerable to errors during periods of overlapping speech, even when multiple audio sources are used to estimate delays between captured audio signals. One solution is use visual cues to solve the problem audio-visually [8,9,10] but improvements are not always consistent so it is difficult to conclude how when they are useful.…”
Section: Introductionmentioning
confidence: 99%
“…Much previous work that exploit temporal correspondences between speech and vision have tended to assume that the motion from the mouth is the principal visual manifestation of speech [11,8]. However, there is much evidence from both social psychology [12] and computational methods [13,9] to suggest that speaking in conversations can manifest itself in broader body motions, which psychologists suggest aid cognitive communicative processes [12].…”
Section: Introductionmentioning
confidence: 99%
“…Noulas and Krose [6] investigated an on-line multimodal speaker diarisation system based on dymamic Bayesian networks and audio-visual mutual information in a constrained setting (videos of two seating persons speaking in turns). An interesting two steps real-time multimodal system to analyse group meetings was proposed by Otsuka et al [8].…”
Section: Introductionmentioning
confidence: 99%
“…Note that our data are more challenging than those used in [6,8] (where participants were assumed always seated in front of the camera). The main contribution of this paper is to exploit the role of gaze in coordinating turn-taking, by adopting a novel feature set based on Visual Focus of Attention (VFoA) to improve the speaker diarisation.…”
Section: Introductionmentioning
confidence: 99%