Spatially squeezed surround audio coding (S3AC) has been previously shown to provide efficient coding with perceptually accurate soundfield reconstruction when applied to ITU 5.1 multichannel audio. This paper investigates the application of S3AC to the coding of Ambisonic audio recordings. Traditional ambisonics achieve compression and backward compatibility through the use of the UHJ matrixing approach to obtain a stereo signal. In this paper the relationship to Ambisonic B-format signals is described and alternative approaches that derive a stereo or mono-downmix signal based on S3AC are presented and evaluated. The mono-downmix approach utilizes side information consisting of spatial cues that are quantized based on novel source localization listening experiments. Objective and subjective tests demonstrate significant improvements in the localization of sound sources resulting from decoding the compressed B-format signals to a 5.1 speaker playback. A SPATIAL SQUEEZING APPROACH TO AMBISONIC AUDIO COMPRESSION Bin Cheng, Christian Ritz and Ian BurnettWhisper Laboratories, University of Wollongong, Wollongong, NSW, Australia bc362@uow.edu.au, critz@uow.edu.au, ianb@uow.edu.au ABSTRACTSpatially Squeezed Surround Audio Coding (S 3 AC) has been previously shown to provide efficient coding with perceptually accurate soundfield reconstruction when applied to ITU 5.1 multichannel audio. This paper investigates the application of S 3 AC to the coding of Ambisonic audio recordings. Traditional Ambisonics achieve compression and backward compatibility through the use of the UHJ matrixing approach to obtain a stereo signal. In this paper the relationship to Ambisonic B-format signals is described and alternative approaches that derive a stereo or mono-downmix signal based on S 3 AC are presented and evaluated. The mono-downmix approach utilizes side information consisting of spatial cues that are quantized based on novel source localization listening experiments. Objective and subjective tests demonstrate significant improvements in the localization of sound sources resulting from decoding the compressed B-format signals to a 5.1 speaker playback.
The derivation of spatial cues representing source localisation information is a typical component of multichannel spatial audio coders. Efficient compression of spatial cues based on psychoacoustic localisation features is investigated. Results show that the proposed quantisation approach for spatial cue compression achieves bit-rates of less than 6 kbit/s while preserving critical source localisation information.
Teleconferencing systems are becoming increasing realistic and pleasant for users to interact with geographically distant meeting participants. Video screens display a complete view of the remote participants, using technology such as wraparound or multiple video screens. However, the corresponding audio does not offer the same sophistication: often only a mono or stereo track is presented. This paper proposes a teleconferencing audio recording and playback paradigm that captures the spatial location of the geographically distributed participants for rendering of the remote soundfields at the users' end. Utilizing standard 5.1 surround sound playback, this paper proposes a surround rendering approach that `squeezes' the multiple recorded soundfields from remote teleconferencing sites to assist the user to disambiguate multiple speakers from different participating sites.
, "A general compression approach to multi-channel three-dimensional audio," IEEE Transactions on Audio, Speech and Language Processing, vol. 21, (8) pp. 1676-1688, 2013 A general compression approach to multi-channel three-dimensional audio AbstractThis paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel. © 2006-2012 IEEE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.