Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

Vera-Diaz, Juan Manuel; Pizarro, Daniel; Macías-Guarasa, Javier

doi:10.3390/s18103418

Cited by 100 publications

(54 citation statements)

References 64 publications

(161 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here we propose an encoder-fully connected residual architecture whose residual blocks are based in the ones from [17]. The encoder-fully connected architectures are traditionally used in classification tasks like [18][19][20][21][22][23], actually it is used in a range of applications like Wi-Fi people detection [24], regression [25][26][27][28] or indoor localization [29,30]. We define the proposed CNN as ours for three main reasons: first, that is a CNN explicitly created for 1D signal processing; at next, we prepare an structure capable of process high and low frequencies with the use of long kernels; finally, we base our residual blocks in ResNet [17], but all of them have been adapted to 1D processing.…”

Section: ) Cnn Architecturementioning

confidence: 99%

Is the PPG Signal Chaotic?

et al. 2020

View full text Add to dashboard Cite

PhotoPlethysmoGraphic (PPG) signal is an easily accessible biological signal that gives valuable diagnostic information. The novelty is the study procedure of the dynamic of the PPG signals, in our case of young and healthy individuals, with Deep Neural Network, which allows finding the dynamic behavior at different timescales. On a small timescale, the dynamic behavior of the PPG signal is predominantly quasi-periodic. On a large timescale, a more complex dynamic diversity emerges, but never a chaotic behavior as earlier studies had reported. The procedure that determines the dynamics of the PPG signal consists of contrasting the dynamics of a PPG signal with well-known dynamics-named reference signals in this study-, mostly present in physical systems, such as periodic, quasi-periodic, aperiodic, chaotic or random dynamics. For this purpose, this paper provides two methods of analysis based on Deep Neural Network (DNN) architectures. The former uses a Convolutional Neural Network (CNN) architecture model. Upon training with reference signals, the CNN model identifies the dynamics present in the PPG signal at different timescales, assigning, according to a classification process, an occurrence probability to each of them. The latter uses a Recurrent Neural Network (RNN) based on a Long Short-Term Memory (LSTM) architecture. With each of the signals, whether reference signals or PPG signals, the RNN model infers an evolution function (nonlinear regression model) based on training data, and considers its predictive capability over a relatively short time horizon. INDEX TERMS Biological signal, DNN architectures, PPG signal dynamic, timescales.

show abstract

Section: ) Cnn Architecturementioning

confidence: 99%

Is the PPG Signal Chaotic?

et al. 2020

View full text Add to dashboard Cite

show abstract

“…As discussed in the previous section, an alternative tool to the direct computation of the psycho-acoustic metrics is needed to perform real-time PA evaluation within an IoT node. In recent years, deep neural networks (DNNs) have been extensively used to solve a wide range of problems related to audio signal processing, such as audio event detection [21][22][23], source separation [24] or source localization [25]. In this context, CNNs have been shown to be a powerful tool for many audio-related tasks, with internal layers that are able to learn optimized features capturing those signal properties that are relevant to solve the task at hand.…”

Section: Methodsmentioning

confidence: 99%

Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks

et al. 2019

View full text Add to dashboard Cite

Psycho-acoustic parameters have been extensively used to evaluate the discomfort or pleasure produced by the sounds in our environment. In this context, wireless acoustic sensor networks (WASNs) can be an interesting solution for monitoring subjective annoyance in certain soundscapes, since they can be used to register the evolution of such parameters in time and space. Unfortunately, the calculation of the psycho-acoustic parameters involved in common annoyance models implies a significant computational cost, and makes difficult the acquisition and transmission of these parameters at the nodes. As a result, monitoring psycho-acoustic annoyance becomes an expensive and inefficient task. This paper proposes the use of a deep convolutional neural network (CNN) trained on a large urban sound dataset capable of efficiently predicting psycho-acoustic annoyance from raw audio signals continuously. We evaluate the proposed regression model and compare the resulting computation times with the ones obtained by the conventional direct calculation approach. The results confirm that the proposed model based on CNN achieves high precision in predicting psycho-acoustic annoyance, predicting annoyance values with an average quadratic error of around 3%. It also achieves a very significant reduction in processing time, which is up to 300 times faster than direct calculation, making CNN designed a clear exponent to work in IoT devices.

show abstract

“…In order to overcome these problems and providing an end-to-end solution, other approaches have proposed the use of 1D convolutions using the raw audio signals as input. Recent works have shown satisfactory results in this direction [20,21,22,23,24,25,26,27,28].…”

Section: Introductionmentioning

confidence: 95%

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

Naranjo-Alcazar

Perez-Castanos²,

Martín-Morató

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their adoption for 1D end-to-end architectures is still scarce in the audio domain. Thus, the suitability of different residual block designs for raw audio classification is partly unknown. The purpose of this paper is to compare, analyze and discuss the performance of several residual block implementations, the most commonly used in image classification problems, within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms. Deep and careful statistical analyses over six different residual block alternatives are conducted, considering two well-known datasets and common input normalization choices. The results show that, while some significant differences in performance are observed among architectures using different residual block designs, the selection of the most suitable residual block can be highly dependent on the input data.

show abstract

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

Cited by 100 publications

References 64 publications

Is the PPG Signal Chaotic?

Is the PPG Signal Chaotic?

Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

Contact Info

Product

Resources

About