2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2013
DOI: 10.1109/humanoids.2013.7029977
|View full text |Cite
|
Sign up to set email alerts
|

Active-speaker detection and localization with microphones and cameras embedded into a robotic head

Abstract: Abstract-In this paper we present a method for detecting and localizing an active speaker, i.e., a speaker that emits a sound, through the fusion between visual reconstruction with a stereoscopic camera pair and sound-source localization with several microphones. Both the cameras and the microphones are embedded into the head of a humanoid robot. The proposed statistical fusion model associates 3D faces of potential speakers with 2D sound directions. The paper has two contributions: (i) a method that discretiz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
0
2

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(27 citation statements)
references
References 30 publications
1
24
0
2
Order By: Relevance
“…In the absence of visual cues, for example in complete darkness, we tend to turn our heads in a quite exaggerated way to explore aural signal. This suggests that visual and aural directional cues are integrated by a top level worldview manager incorporating information from both (plus other) types of sensory system, and this approach has been exploited in robotic systems (e.g., [35]). …”
Section: Remarksmentioning
confidence: 99%
See 2 more Smart Citations
“…In the absence of visual cues, for example in complete darkness, we tend to turn our heads in a quite exaggerated way to explore aural signal. This suggests that visual and aural directional cues are integrated by a top level worldview manager incorporating information from both (plus other) types of sensory system, and this approach has been exploited in robotic systems (e.g., [35]). …”
Section: Remarksmentioning
confidence: 99%
“…It is only relatively recently that binaural sensing in robotic systems has developed sufficiently for the deployment of processes such as finding directions to acoustic sources [25][26][27][28][29][30][31][32][33][34][35][36]. Acoustic source localization has been largely restricted to estimating azimuth [26][27][28][29][30][31][32][33] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [34,35,37,38].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Then, to localize sound sources using acoustic signals, some methods also have been developed. For instance, sound source localization using time frequency histogram by two microphones [6], through the fusion between visual reconstruction with a stereoscopic camera pair with several microphones [7], and the application of envelope and wavelet transform to enhance the resolution of the received signals through the combination of different time-frequency contents [8].…”
Section: Icose Conference Proceedingsmentioning
confidence: 99%
“…Acoustic source localization has been largely restricted to estimating azimuth [1-13] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [14,15,17,18]. Information gathered as the head is turned has been exploited either to locate the azimuth at which the ITD reduces to zero, thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only the azimuth [4][5][6][7][8][9][10][11][12][13][14]19,20].…”
Section: Introductionmentioning
confidence: 99%