Particle-filter tracking of sounds for frequency-independent 3D audio rendering from distributed B-format recordings

Blochberger, Matthias; Zotter, Franz

doi:10.1051/aacus/2021012

Cited by 10 publications

(10 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The system also separately renders the residual ambient components, which represent the Ambisonic receiver signals after the source objects have been subtracted from them. Therefore, in essence, the proposed system may be viewed as a natural multireceiver extension to the Coding and Multi-Parameterisation of Ambisonic Sound Scenes (COMPASS) single-receiver method [7] and is also similar to the approach proposed recently in [8]. With an emphasis on developing a practical system, the proposed processing approach is implemented as a real-time Virtual Studio Technology (VST) audio plug-in 1 .…”

Section: Introductionmentioning

confidence: 99%

“…The closest work to the method proposed in the present article is described in [8], which operated in the timedomain and combined: building 2D planar activity maps based on broadband grid-scanning methods from each receiver, followed by peak-finding to ascertain source position estimates; subsequent particle-filtering based tracking of active sound objects; and then the application of broadband beamforming and spatialization of the objects, mixed with ambient rendering. Building on the work of [8], the proposed system instead operates in the time-frequency domain and lends particular emphasis on real-time operation. It forgoes the use of computationally expensive activitymap-based source position estimation in favor of continuous DoA estimation methods followed by computing the intersecting points between receivers.…”

Section: Introductionmentioning

confidence: 99%

“…Evaluations involving actual listening tests were conducted for the systems described in [26,27,31,16,20,9,21,42,24,8]. The mean opinion scores for the multipoint source separation method of [27] indicated robust spatial reproduction compared with a stereo reference but lower naturalness compared with a mono reference.…”

Section: Introductionmentioning

confidence: 99%

“…The work compares mixed and sparse sound-field decomposition approaches against conventional plane-wave nonsparse alternatives, with demonstrably better results when using the former. Finally, listening tests of the system described in [8] were conducted using either fixed listener positions or a dynamic listener following a predefined linear trajectory. It was demonstrated that the system outperformed nonparametric interpolation based alternatives in the majority of cases.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers

McCormack¹,

Politis²,

McKenzie³

et al. 2022

J. Audio Eng. Soc.

View full text Add to dashboard Cite

This article proposes a system for object-based six-degrees-of-freedom (6DoF) rendering of spatial sound scenes that are captured using a distributed arrangement of multiple Ambisonic receivers. The approach is based on first identifying and tracking the positions of sound sources within the scene, followed by the isolation of their signals through the use of beamformers. These sound objects are subsequently spatialized over the target playback setup, with respect to both the head orientation and position of the listener. The diffuse ambience of the scene is rendered separately by first spatially subtracting the source signals from the receivers located nearest to the listener position. The resultant residual Ambisonic signals are then spatialized, decorrelated, and summed together with suitable interpolation weights. The proposed system is evaluated through an in situ listening test conducted in 6DoF virtual reality, whereby real-world sound sources are compared with the auralization achieved through the proposed rendering method. The results of 15 participants suggest that in comparison to a linear interpolation-based alternative, the proposed object-based approach is perceived as being more realistic.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers

McCormack¹,

Politis²,

McKenzie³

et al. 2022

J. Audio Eng. Soc.

View full text Add to dashboard Cite

show abstract

“…is function can map the data of the test set to one of the given categories, thus realizing the category prediction of unknown data. At present, common classi ers include decision tree, logistic regression, support vector machine (SVM), Naive Bayes, k-nearest neighbor algorithm (KNN), BP neural network, and deep learning [11][12][13].…”

Section: Introductionmentioning

confidence: 99%

Research on Audio Recognition Based on the Deep Neural Network in Music Teaching

Cui

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Solfeggio is an important basic course for music majors, and audio recognition training is one of the important links. With the improvement of computer performance, audio recognition has been widely used in smart wearable devices. In recent years, the development of deep learning has accelerated the research process of audio recognition. However, there is a lot of sound interference in music teaching environment, which leads to the performance of the audio classifier that cannot meet the actual demand. In order to solve this problem, an improved audio recognition system based on YOLO-v4 is proposed, which mainly improves the network structure. First, Mel frequency cepstrum number is used to process the original audio and extract the corresponding features. Then, try to apply the YOLO-v4 model in the field of deep learning to the field of audio recognition and improve it by combining with the spatial pyramid pool module to strengthen the generalization ability of data in different audio formats. Second, the stacking method in ensemble learning is used to fuse the independent submodels of two different channels. Experimental results show that compared with other deep learning technologies, the improved YOLO-v4 model can improve the performance of audio recognition, and it has better performance in processing data of different audio formats, which shows better generalization ability.

show abstract

Spatial audio signal processing for binaural reproduction of recorded acoustic scenes – review and challenges

Rafaely

Tourbabin²,

Habets³

et al. 2022

Acta Acust.

View full text Add to dashboard Cite

Spatial audio has been studied for several decades, but has seen much renewed interest recently due to advances in both software and hardware for capture and playback, and the emergence of applications such as virtual reality and augmented reality. This renewed interest has led to the investment of increasing efforts in developing signal processing algorithms for spatial audio, both for capture and for playback. In particular, due to the popularity of headphones and earphones, many spatial audio signal processing methods have dealt with binaural reproduction based on headphone listening. Among these new developments, processing spatial audio signals recorded in real environments using microphone arrays plays an important role. Following this emerging activity, this paper aims to provide a scientific review of recent developments and an outlook for future challenges. This review also proposes a generalized framework for describing spatial audio signal processing for the binaural reproduction of recorded sound. This framework helps to understand the collective progress of the research community, and to identify gaps for future research. It is composed of five main blocks, namely: the acoustic scene, recording, processing, reproduction, and perception and evaluation. First, each block is briefly presented, and then, a comprehensive review of the processing block is provided. This includes topics from simple binaural recording to Ambisonics and perceptually motivated approaches, which focus on careful array configuration and design. Beamforming and parametric-based processing afford more flexible designs and shift the focus to processing and modeling of the sound field. Then, emerging machine- and deep-learning approaches, which take a further step towards flexibility in design, are described. Finally, specific methods for signal transformations such as rotation, translation and enhancement, enabling additional flexibility in reproduction and improvement in the quality of the binaural signal, are presented. The review concludes by highlighting directions for future research.

show abstract

Particle-filter tracking of sounds for frequency-independent 3D audio rendering from distributed B-format recordings

Cited by 10 publications

References 30 publications

Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers

Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers

Research on Audio Recognition Based on the Deep Neural Network in Music Teaching

Spatial audio signal processing for binaural reproduction of recorded acoustic scenes – review and challenges

Contact Info

Product

Resources

About