Proceedings of the 2009 International Conference on Multimodal Interfaces 2009
DOI: 10.1145/1647314.1647370
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal and multi-camera attention in smart environments

Abstract: This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating diff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2009
2009
2013
2013

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 22 publications
1
13
0
Order By: Relevance
“…For active scene exploration, saliency can be used to steer the sensors towards salientthus potentially relevant -regions to detect objects of interest (e.g. [8], [10], [12]). Combining these methods, [14] utilized bottom-up attention, stereo vision and SIFT to perform robust and efficient scene analysis on a mobile robot.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For active scene exploration, saliency can be used to steer the sensors towards salientthus potentially relevant -regions to detect objects of interest (e.g. [8], [10], [12]). Combining these methods, [14] utilized bottom-up attention, stereo vision and SIFT to perform robust and efficient scene analysis on a mobile robot.…”
Section: Related Workmentioning
confidence: 99%
“…[8]) or audio-visually (e.g. [10], [12]) salient regions is a natural and efficient method to detect and focus interacting persons.…”
Section: Realizationmentioning
confidence: 99%
“…5.1), this is sufficient. Note that, in a multi-camera setting, a view selection algorithm can be applied choosing the two "best" views according to some global criteria [6]. Aligned trajectory points are then projected to 3D by ray casting.…”
Section: D Combinationmentioning
confidence: 99%
“…In particular, we propose representing a gesture by projection on its principal plane of motion, which we call the action plane. For the acquisition of gesture trajectories, we build upon our previous work on 3D pointing gesture recognition [5] and saliency-based view selection in multi-camera setups [6].…”
Section: Introductionmentioning
confidence: 99%
“…For this purpose, the identities of the persons in the room have to be determined (see (Salah et al, 2008)) as well as the audio-visual focus of attention has to be estimated (see (Voit and Stiefelhagen, 2010;Schauerte et al, 2009)), e.g. to present personalized information on the display a person is currently looking at.…”
Section: Introductionmentioning
confidence: 99%