The perceptual evaluation of spatial audio systems may be based on singular auditory qualities such as the localization accuracy or the perception of coloration, on overall criteria of perceptual accuracy such as plausibility and authenticity or on detailed catalogues of auditory qualities. However, only the latter will be suited for the perceptual characterization of a simulation's technical shortcomings and allow for its focused improvement. Therefore, a common vocabulary containing all perceptual attributes which are relevant in this context appears desirable. Existing vocabularies for the evaluation of sound field synthesis, spatialization technologies and virtual acoustic environments were often generated ad hoc by authors or were focused on specific perceptual aspects or on specific spatialization techniques only. To overcome limitations with respect to the relevance and completeness of these vocabularies we have developed a Spatial Audio Quality Inventory (SAQI) for the perceptual evaluation of all spatial audio technologies used for the (re)synthesis of acoustic environments. It is a consensus vocabulary comprising 48 verbal descriptors of auditive qualities assumed to be of practical relevance when comparing (re)synthesized sound fields to real or imagined references or amongst each other. The vocabulary was generated by a Focus Group of 21 German speaking experts for virtual acoustics. Five additional experts helped verifying the unambiguity of all descriptors and the related explanations. Moreover, an English translation was generated and verified by eight bilingual experts. This article describes the applied methodology and presents the English version of the final vocabulary.
Aiming at the perceptual evaluation of virtual acoustic environments (VAEs), 'plausibility' is introduced as a quality criterion that can be of value for many applications of virtual realities. We suggest a definition as well as an experimental operationalization for plausibility, referring to the perceived agreement with the listener's expectation towards an equivalent real acoustic event. A listening test methodology for the criterion-free assessment of the deviation from this non-explicit, inner reference is proposed. It requires the rating of corresponding real and simulated stimuli in a Yes/No test paradigm, and the analysis of the results according to signal detection theory. The specification of minimum effect hypotheses allows the testing of plausibility with any desired strictness. The approach is demonstrated with the perceptual evaluation of a system for dynamic binaural synthesis in two different development stages.
A simulation that is perceptually indistinguishable from the corresponding real sound field could be termed authentic. Using binaural technology, such a simulation would theoretically be achieved by reconstructing the sound pressure at a listener's ears. However, inevitable errors in the measurement, rendering, and reproduction introduce audible degradations, as it has been demonstrated in previous studies for anechoic environments and static binaural simulations (fixed head orientation). The current study investigated the authenticity of individual dynamic binaural simulations for three different acoustic environments (anechoic, dry, wet) using a highly sensitive listening test design. The results show that about half of the participants failed to reliably detect any differences for a speech stimulus, whereas all participants were able to do so for pulsed pink noise. Higher detection rates were observed in the anechoic condition, compared to the reverberant spaces, while the source position had no significant effect. It is concluded that the authenticity mainly depends on how comprehensive the spectral cues are provided by the audio content, and the amount of reverberation, whereas the source position plays a minor role. This is confirmed by a broad qualitative evaluation, suggesting that remaining differences mainly affect the tone color rather than the spatial, temporal or dynamical qualities.
Head-related transfer functions (HRTFs) were acoustically measured and numerically simulated for the FABIAN head and torso simulator on a full-spherical and high resolution sampling grid. Moreover, HRTFs were acquired for 11 horizontal head-above-torso orientations, covering the typical range of motion of ±50• , making it possible to account for head movements of the listeners in dynamic binaural auralizations in a physically correct manner. In lack of an external reference for HRTFs, measured and simulated data sets were cross-validated by applying auditory models for localization performance and spectral coloration and by correlation analyses. The results indicate a high degree of similarity between the two data sets regarding all tested aspects, thus suggesting that they are free of systematic errors. The HRTF database is publicly available from https://doi.org/10.14279/depositonce-5718.2 and is accompanied by a wide range of headphone filters for use in binaural synthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.