2017
DOI: 10.20965/jrm.2017.p0037
|View full text |Cite
|
Sign up to set email alerts
|

Sound Source Localization Using Deep Learning Models

Abstract: [abstFig src='/00290001/04.jpg' width='300' text='Using a deep learning model, the robot locate the sound source from a multiple channel audio stream input' ] This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we em… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
82
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 112 publications
(83 citation statements)
references
References 19 publications
0
82
0
1
Order By: Relevance
“…cross-correlation [1,5,8], inter-channel phase and level difference [9,10]) or the subspace-based features [2,11,12], whereas SNS classification normally requires features computed from the power spectrum [13,14]. Recently, it has been shown that instead of applying complicated feature extraction, we can directly use the power spectrum as the inputs for neural networkbased sound source localization [15]. However, unlike in [15], our method employs the real and imaginary parts of the STFT, preserving both the power and phase information.…”
Section: Network Inputmentioning
confidence: 99%
See 2 more Smart Citations
“…cross-correlation [1,5,8], inter-channel phase and level difference [9,10]) or the subspace-based features [2,11,12], whereas SNS classification normally requires features computed from the power spectrum [13,14]. Recently, it has been shown that instead of applying complicated feature extraction, we can directly use the power spectrum as the inputs for neural networkbased sound source localization [15]. However, unlike in [15], our method employs the real and imaginary parts of the STFT, preserving both the power and phase information.…”
Section: Network Inputmentioning
confidence: 99%
“…Recently, it has been shown that instead of applying complicated feature extraction, we can directly use the power spectrum as the inputs for neural networkbased sound source localization [15]. However, unlike in [15], our method employs the real and imaginary parts of the STFT, preserving both the power and phase information. The raw data received by the robot are 4-channel audio signals sampled at 48 kHz.…”
Section: Network Inputmentioning
confidence: 99%
See 1 more Smart Citation
“…Opposed to the previously described works using refined features directly related to the localization problem, we can also find others using frequency domain features directly [48,52], in some cases generated from spectrograms of general time-frequency representations [51,54]. These approaches represent a step forward compared with the previous ones, as they give the network the responsibility of automatically learn the relationship between spectral cues and the location related information [57] kind of combines both strategies, as they use spectral features but calculating them in a cross-spectral fashion, that is, combining the values from all the available microphones in the so-called Cross Spectral Map (CSM).…”
Section: State Of the Artmentioning
confidence: 99%
“…Significant emerging technologies being discussed in various research studies include wearable technologies, networked and smart environments connected by the Internet of Things, evolving tools, tangible interfaces, human-robot collaborations, processes and interactions, virtual reality, ubiquitous use of machine learning, and deep-learning algorithms [22]. In the last few years, applications based on deep neural networks have performed various tasks, such as speech recognition or image classification, to levels that even exceed human abilities [23][24][25]. SAE is a deep-learning algorithm.…”
Section: Saementioning
confidence: 99%