2011 IEEE/RSJ International Conference on Intelligent Robots and Systems 2011
DOI: 10.1109/iros.2011.6094825
|View full text |Cite
|
Sign up to set email alerts
|

Listening for people: Exploiting the spectral structure of speech to robustly perceive the presence of people

Abstract: As the desire to see robots ubiquitous in society grows, so does the need for providing the robots with the means of building awareness of any humans with which it may be sharing the environment. This paper presents a real-world suitable system which enables robots to robustly perceive the presence of people acoustically. The proposed binaural system first identifies voiced signal by means of a novel approach to Voice Activity Detection that exploits the spectral signature and characteristics of speech without… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…One can cite for instance [80], or [81] where the PhaT processor is exploited on a 24 evenly spaced microphones array fitted on the 3.2 m-long walls of a room which is visited by a tour-guide robot. More recent use of the PhaT approach can be cited: [82] where a triangular 3-microphone array is used to infer source location from short time observations so as to cope with the movement of the sound source or the robot; [83] presents an evaluation of various real-time sound localization approaches from a cubical 8-microphone array in which GCC-PhaT is compared with beamforming techniques; [84] proposes a robust approach to the acoustic perception of the presence of people from a pair of microphones. But GCC-PhaT only takes into account the phase of the perceived signals in the intercorrelation computation, giving the same importance to each frequency.…”
Section: Musicmentioning
confidence: 99%
See 1 more Smart Citation
“…One can cite for instance [80], or [81] where the PhaT processor is exploited on a 24 evenly spaced microphones array fitted on the 3.2 m-long walls of a room which is visited by a tour-guide robot. More recent use of the PhaT approach can be cited: [82] where a triangular 3-microphone array is used to infer source location from short time observations so as to cope with the movement of the sound source or the robot; [83] presents an evaluation of various real-time sound localization approaches from a cubical 8-microphone array in which GCC-PhaT is compared with beamforming techniques; [84] proposes a robust approach to the acoustic perception of the presence of people from a pair of microphones. But GCC-PhaT only takes into account the phase of the perceived signals in the intercorrelation computation, giving the same importance to each frequency.…”
Section: Musicmentioning
confidence: 99%
“…This basic geometric rule is used in [90], the computed azimuths being involved into a neural network based sound source tracker. The same strategy is used in [78], or [84]. Following the same lines, one can deduce the cartesian coordinates r s = (u, v, w) of a source from the known positions r m n = (x n , y n , z n ), n = 1, .…”
Section: Musicmentioning
confidence: 99%
“…Additionally, in [55], an evaluation of real-time sound localization approaches are compared using an 8-microphone array. Similar architectures were presented in [56,57], where the common need of an array of microphones can be considered as a drawback for practical applications [58,59]. As such, the present work considers the acoustic energy-based model of [44].…”
Section: Acoustic Problem Formulationmentioning
confidence: 97%
“…This approach to person detection (presented in Hilsenbeck & Kirchner, 2011) is not 3D data reliant, but rather, is audio based. In this binaural approach the robot exploits the spectral structure (in the audio frequency domain) and characteristics specific to human-originating sound (speech or humming for instance) to robustly detect the presence and gender of human speakers.…”
Section: The Read Branch: Sensing and Perception Of Human Cuesmentioning
confidence: 99%