2012
DOI: 10.1007/978-3-642-27355-1_71
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Cue Detection Engine for Orchestrated Entertainment

Abstract: Abstract. In this paper, we describe a low delay real-time multimodal cue detection engine for a living room environment. The system is designed to be used in open, unconstrained environments to allow multiple people to enter, interact and leave the observable world with no constraints. It comprises detection and tracking of up to 4 faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, their association and fusion. The system is designed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2012
2012
2013
2013

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…As examples of content-based anno-tations, the processing of audio utterances have been used to perform speech segmentation, word spotting, skimming and topic segmentation [Bouamrane and Luz, 2006] whereas speech recognition has been applied to index discourse using text-based techniques [Kaptein and Marx, 2010]. Video-based sources have been used to index non-verbal interaction, for instance via computer vision to detect behaviors from patterns of eye gaze, gestures and focus of attention [Korchagin et al, 2012]. The use of context-based sources has the purpose of associating external attributes in order to enrich the description or the content of the media elements.…”
Section: Abstractionmentioning
confidence: 99%
“…As examples of content-based anno-tations, the processing of audio utterances have been used to perform speech segmentation, word spotting, skimming and topic segmentation [Bouamrane and Luz, 2006] whereas speech recognition has been applied to index discourse using text-based techniques [Kaptein and Marx, 2010]. Video-based sources have been used to index non-verbal interaction, for instance via computer vision to detect behaviors from patterns of eye gaze, gestures and focus of attention [Korchagin et al, 2012]. The use of context-based sources has the purpose of associating external attributes in order to enrich the description or the content of the media elements.…”
Section: Abstractionmentioning
confidence: 99%