2018
DOI: 10.3390/s18103418
|View full text |Cite
|
Sign up to set email alerts
|

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

Abstract: This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first tr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 100 publications
(54 citation statements)
references
References 64 publications
(161 reference statements)
0
54
0
Order By: Relevance
“…Here we propose an encoder-fully connected residual architecture whose residual blocks are based in the ones from [17]. The encoder-fully connected architectures are traditionally used in classification tasks like [18][19][20][21][22][23], actually it is used in a range of applications like Wi-Fi people detection [24], regression [25][26][27][28] or indoor localization [29,30]. We define the proposed CNN as ours for three main reasons: first, that is a CNN explicitly created for 1D signal processing; at next, we prepare an structure capable of process high and low frequencies with the use of long kernels; finally, we base our residual blocks in ResNet [17], but all of them have been adapted to 1D processing.…”
Section: ) Cnn Architecturementioning
confidence: 99%
“…Here we propose an encoder-fully connected residual architecture whose residual blocks are based in the ones from [17]. The encoder-fully connected architectures are traditionally used in classification tasks like [18][19][20][21][22][23], actually it is used in a range of applications like Wi-Fi people detection [24], regression [25][26][27][28] or indoor localization [29,30]. We define the proposed CNN as ours for three main reasons: first, that is a CNN explicitly created for 1D signal processing; at next, we prepare an structure capable of process high and low frequencies with the use of long kernels; finally, we base our residual blocks in ResNet [17], but all of them have been adapted to 1D processing.…”
Section: ) Cnn Architecturementioning
confidence: 99%
“…As discussed in the previous section, an alternative tool to the direct computation of the psycho-acoustic metrics is needed to perform real-time PA evaluation within an IoT node. In recent years, deep neural networks (DNNs) have been extensively used to solve a wide range of problems related to audio signal processing, such as audio event detection [21][22][23], source separation [24] or source localization [25]. In this context, CNNs have been shown to be a powerful tool for many audio-related tasks, with internal layers that are able to learn optimized features capturing those signal properties that are relevant to solve the task at hand.…”
Section: Methodsmentioning
confidence: 99%
“…In order to overcome these problems and providing an end-to-end solution, other approaches have proposed the use of 1D convolutions using the raw audio signals as input. Recent works have shown satisfactory results in this direction [20,21,22,23,24,25,26,27,28].…”
Section: Introductionmentioning
confidence: 95%