2014
DOI: 10.1177/0278364914548050
|View full text |Cite
|
Sign up to set email alerts
|

Vision-guided robot hearing

Abstract: Natural human-robot interaction in complex and unpredictable environments is one of the main research lines in robotics. In typical real-world scenarios, humans are at some distance from the robot and the acquired signals are strongly impaired by noise, reverberations and other interfering sources. In this context, the detection and localisation of speakers plays a key role since it is the pillar on which several tasks (e.g.: speech recognition and speaker tracking) rely. We address the problem of how to detec… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(7 citation statements)
references
References 49 publications
0
7
0
Order By: Relevance
“…Setting p(z|x, y, m) = p(z|x, m) is equivalent to say that Z and Y are conditionally independent given x, which can be expressed equivalently . We now further calculate this expression for the J-GMM model defined in (7). Injecting 7into (9) leads to:…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Setting p(z|x, y, m) = p(z|x, m) is equivalent to say that Z and Y are conditionally independent given x, which can be expressed equivalently . We now further calculate this expression for the J-GMM model defined in (7). Injecting 7into (9) leads to:…”
Section: Discussionmentioning
confidence: 99%
“…In contrast, we provided detailed derivation in [18], where (19) is shown to result into two equivalent forms of a GMR expression (25) and (26). Also, to be fully precise, (7) in [21] corresponds to (26) in [18] up to two differences that we interpret as typos: First, the term Σ…”
Section: A E-stepmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper we propose a novel multi-speaker tracking method inspired from previous research on "instantaneous" audio-visual fusion [11,12]. A dynamic Bayesian model is investigated to smoothly fuse acoustic and visual information over time from their feature spaces.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper we propose to enforce audio-visual spatial coincidence, e.g., [1,8,10], rather than temporal coincidence, e.g., correlation [9,16], into diarization. We consider a setup consisting of people that are engaged in a multiparty conversation while they are free to move and to turn their attention away from the cameras.…”
Section: Introductionmentioning
confidence: 99%