Active-speaker detection and localization with microphones and cameras embedded into a robotic head

Čech, Jan; Mittal, R.K.; Deleforge, Antoine; Sanchez-Riera, Jordi; Alameda-Pineda, Xavier; Horaud, Radu

doi:10.1109/humanoids.2013.7029977

Cited by 31 publications

(27 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the absence of visual cues, for example in complete darkness, we tend to turn our heads in a quite exaggerated way to explore aural signal. This suggests that visual and aural directional cues are integrated by a top level worldview manager incorporating information from both (plus other) types of sensory system, and this approach has been exploited in robotic systems (e.g., [35]). …”

Section: Remarksmentioning

confidence: 99%

“…It is only relatively recently that binaural sensing in robotic systems has developed sufficiently for the deployment of processes such as finding directions to acoustic sources [25][26][27][28][29][30][31][32][33][34][35][36]. Acoustic source localization has been largely restricted to estimating azimuth [26][27][28][29][30][31][32][33] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [34,35,37,38].…”

Section: Introductionmentioning

confidence: 99%

“…Acoustic source localization has been largely restricted to estimating azimuth [26][27][28][29][30][31][32][33] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [34,35,37,38]. Information gathered as the head is turned has been exploited either to locate the azimuth at which ITD reduces to zero thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only azimuth [28][29][30][31][32][33][34]39,40].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Synthetic Aperture Computation as the Head is Turned in Binaural Direction Finding

Tamsett

2017

Robotics

View full text Add to dashboard Cite

Binaural systems measure instantaneous time/level differences between acoustic signals received at the ears to determine angles λ between the auditory axis and directions to acoustic sources. An angle λ locates a source on a small circle of colatitude (a lamda circle) on a sphere symmetric about the auditory axis. As the head is turned while listening to a sound, acoustic energy over successive instantaneous lamda circles is integrated in a virtual/subconscious field of audition. The directions in azimuth and elevation to maxima in integrated acoustic energy, or to points of intersection of lamda circles, are the directions to acoustic sources. This process in a robotic system, or in nature in a neural implementation equivalent to it, delivers its solutions to the aurally informed worldview. The process is analogous to migration applied to seismic profiler data, and to that in synthetic aperture radar/sonar systems. A slanting auditory axis, e.g., possessed by species of owl, leads to the auditory axis sweeping the surface of a cone as the head is turned about a single axis. Thus, the plane in which the auditory axis turns continuously changes, enabling robustly unambiguous directions to acoustic sources to be determined.

show abstract

Section: Remarksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Synthetic Aperture Computation as the Head is Turned in Binaural Direction Finding

Tamsett

2017

Robotics

View full text Add to dashboard Cite

show abstract

“…Then, to localize sound sources using acoustic signals, some methods also have been developed. For instance, sound source localization using time frequency histogram by two microphones [6], through the fusion between visual reconstruction with a stereoscopic camera pair with several microphones [7], and the application of envelope and wavelet transform to enhance the resolution of the received signals through the combination of different time-frequency contents [8].…”

Section: Icose Conference Proceedingsmentioning

confidence: 99%

Application of Short Time Fourier Transform and Wavelet Transform for Sound Source Localization Using Single Moving Microphone in Machine Condition Monitoring

Rusli¹

2016

KEG

View full text Add to dashboard Cite

The paper discusses means to predict sound source position emitted by fault machine components based on a single microphone moving in a linear track with constant speed. The position of sound source that consists of some frequency spectrum is detected by time-frequency distribution of the sound signal through Short Time Fourier Transform (STFT) and Continues Wavelet Transform (CWT). As the amplitude of sound pressure increases when the microphone moves closer, the source position and frequency are predicted from the peaks of time-frequency contour map. Firstly, numerical simulation is conducted using two sound sources that generate four different frequencies of sound. The second case is experimental analysis using rotating machine being monitored with unbalanced, misalignment and bearing defect. The result shows that application of both STFT and CWT are able to detect multiple sound sources position with multiple frequency peaks caused by machine fault. The STFT can indicate the frequency very clearly, but not for the peak position. On the other hand, the CWT is able to predict the position of sound at low frequency very clearly. However, it is failed to detect the exact frequency because of overlapping.

show abstract

“…Acoustic source localization has been largely restricted to estimating azimuth [1-13] on the assumption of zero elevation, except where audition has been fused with vision for estimates also of elevation [14,15,17,18]. Information gathered as the head is turned has been exploited either to locate the azimuth at which the ITD reduces to zero, thereby determining the azimuthal direction to a source, or to resolve the front-back ambiguity associated with estimating only the azimuth [4][5][6][7][8][9][10][11][12][13][14]19,20].…”

Section: Introductionmentioning

confidence: 99%

Binaural Range Finding from Synthetic Aperture Computation as the Head is Turned

Tamsett

2017

Robotics

View full text Add to dashboard Cite

A solution to binaural direction finding described in Tamsett (Robotics 2017, 6(1), 3) is a synthetic aperture computation (SAC) performed as the head is turned while listening to a sound. A far-range approximation in that paper is relaxed in this one and the method extended for SAC as a function of range for estimating range to an acoustic source. An instantaneous angle λ (lambda) between the auditory axis and direction to an acoustic source locates the source on a small circle of colatitude (lambda circle) of a sphere symmetric about the auditory axis. As the head is turned, data over successive instantaneous lambda circles are integrated in a virtual field of audition from which the direction to an acoustic source can be inferred. Multiple sets of lambda circles generated as a function of range yield an optimal range at which the circles intersect to best focus at a point in a virtual three-dimensional field of audition, providing an estimate of range. A proof of concept is demonstrated using simulated experimental data. The method enables a binaural robot to estimate not only direction but also range to an acoustic source from sufficiently accurate measurements of arrival time/level differences at the antennae.

show abstract

Active-speaker detection and localization with microphones and cameras embedded into a robotic head

Cited by 31 publications

References 30 publications

Synthetic Aperture Computation as the Head is Turned in Binaural Direction Finding

Synthetic Aperture Computation as the Head is Turned in Binaural Direction Finding

Application of Short Time Fourier Transform and Wavelet Transform for Sound Source Localization Using Single Moving Microphone in Machine Condition Monitoring

Binaural Range Finding from Synthetic Aperture Computation as the Head is Turned

Contact Info

Product

Resources

About