PyAWNeS-Codec: Speech and audio codec for ad-hoc acoustic wireless sensor networks

Bäckström, Tom; Bouafif, Mariem; Zarazaga, Pablo Pérez; Ranjit, Meghna; Das, Sneha; Lachiri, Zied

doi:10.23919/eusipco54536.2021.9616344

Cited by 3 publications

(5 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Particularly, for a constant sound pressure level, low frequencies are perceived more quietly in comparison to frequencies in the most sensitive area around 3000 Hz. One pre-processing technique that is commonly used for speech processing is pre-emphasis filtering [102]. This was initially applied to neural network modelling of guitar amplifiers by Dämskägg et al [75], where a first-order high-pass filter was applied to both model output and target data before computing the loss.…”

Section: Pre-emphasis Filteringmentioning

confidence: 99%

Adversarial Guitar Amplifier Modelling with Unpaired Data

Wright

Välimäki

Juvela

2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Section: Pre-emphasis Filteringmentioning

confidence: 99%

Adversarial Guitar Amplifier Modelling with Unpaired Data

Wright

Välimäki

Juvela

2023

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

“…Since we are interested in speech data, our investigations are focused on selected speech audio codecs. Namely, we use the 3GPP Enhanced Voice Services (EVS) codec (Bruhn et al, 2012), Opus (Valin et al, 2012), our own PyAWNeS-codec (Bäckström et al, 2021), and the neural codec Lyra (Kleijn et al, 2021). In particular,…”

Section: Codecsmentioning

confidence: 99%

“…A delay compensation mode is integrated into the EVS codec, allowing the compensation of the integrated delay of about 32 ms within the encoded output signal. •The Python acoustic wireless network of sensors (PyAWNeS) codec is a speech and audio codec especially designed for distributed scenarios, where multiple independent devices sense and transmit the signal simultaneously (Bäckström et al, 2021). In contrast to prior codecs, it is designed to provide competitive quality in a single channel mode, but such that quality is improved with every added sensor.…”

Section: Codecsmentioning

confidence: 99%

“…The proposed localization scheme is motivated by the speech codec we recently published (Bäckström et al, 2021), where we estimated the TDoA from a multi-channel, coded mixture. This method of front-end processing opens up avenues for processing audio not just from microphones from the same sensor array but from microphones from distinct devices.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we consider a speaker target positioning as a DoA classification task. Our contribution is to study how our recently proposed simple DNN architecture (Zarazaga et al, 2020) performance varies with the different bitrate encoding of recent communication codecs including PyAWNeS (Bäckström et al, 2021), OPUS (Valin et al, 2012), the Enhanced Voice Services (EVS) (Bruhn et al, 2012), and Lyra (Kleijn et al, 2021). We analyze the performance of both considered scenarios as a function of the audio input bitrate.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

Mansali¹,

Zarazaga²,

Bäckström³

et al. 2022

Front. Signal Process.

Self Cite

View full text Add to dashboard Cite

The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.

show abstract

NSVQ: Noise Substitution in Vector Quantization for Machine Learning

Vali

Bäckström

2022

IEEE Access

Self Cite

View full text Add to dashboard Cite

Machine learning algorithms have been shown to be highly effective in solving optimization problems in a wide range of applications. Such algorithms typically use gradient descent with backpropagation and the chain rule. Hence, the backpropagation fails if intermediate gradients are zero for some functions in the computational graph, because it causes the gradients to collapse when multiplying with zero. Vector quantization is one of those challenging functions for machine learning algorithms, since it is a piece-wise constant function and its gradient is zero almost everywhere. A typical solution is to apply the straight through estimator which simply copies the gradients over the vector quantization function in the backpropagation. Other solutions are based on smooth or stochastic approximation. This study proposes a vector quantization technique called NSVQ, which approximates the vector quantization behavior by substituting a multiplicative noise so that it can be used for machine learning problems. Specifically, the vector quantization error is replaced by product of the original error and a normalized noise vector, the samples of which are drawn from a zero-mean, unit-variance normal distribution. We test our proposed NSVQ in three scenarios with various types of applications. Based on the experiments, the proposed NSVQ achieves more accuracy and faster convergence in comparison to the straight through estimator, exponential moving averages, and the MiniBatchKmeans approaches.

show abstract

PyAWNeS-Codec: Speech and audio codec for ad-hoc acoustic wireless sensor networks

Cited by 3 publications

References 31 publications

Adversarial Guitar Amplifier Modelling with Unpaired Data

Adversarial Guitar Amplifier Modelling with Unpaired Data

Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

NSVQ: Noise Substitution in Vector Quantization for Machine Learning

Contact Info

Product

Resources

About