Perceptual Impact on Localization Quality Evaluations of Common Pre-Processing for Non-Individual Head-Related Transfer Functions

Andreopoulou, Areti; Katz, Brian F. G.

doi:10.17743/jaes.2022.0008

Cited by 7 publications

(4 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The HRTF phase was disregarded, assuming that the up-sampled HRTFs would be reconstructed using a minimum-phase approximation and a simple ITD model. It is known that such simplifications could have an impact on certain perceptual features of the HRTFs Andreopoulou and Katz, (2022), therefore further research beyond this pilot will probably need to also consider phase information.…”

Section: Preprocessingmentioning

confidence: 99%

Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study

Siripornpitak

Engel

Squires

et al. 2022

Front. Signal Process.

View full text Add to dashboard Cite

Headphones-based spatial audio simulations rely on Head-related Transfer Functions (HRTFs) in order to reconstruct the sound field at the entrance of the listener’s ears. A HRTF is strongly dependent on the listener’s specific anatomical structures, and it has been shown that virtual sounds recreated with someone else’s HRTF result in worse localisation accuracy, as well as altering other subjective measures such as externalisation and realism. Acoustic measurements of the filtering effects generated by ears, head and torso has proven to be one of the most reliable ways to obtain a personalised HRTF. However this requires a dedicated and expensive setup, and is time-intensive. In order to simplify the measurement setup, thereby improving the scalability of the process, we are exploring strategies to reduce the number of acoustic measurements without degrading the spatial resolution of the HRTF. Traditionally, spatial up-sampling of HRTF sets is achieved through barycentric interpolation or by employing the spherical harmonics framework. However, such methods often perform poorly when the provided HRTF data is spatially very sparse. This work investigates the use of generative adversarial networks (GANs) to tackle the up-sampling problem, offering an initial insight about the suitability of this technique. Numerical evaluations based on spectral magnitude error and perceptual model outputs are presented on single spatial dimensions, therefore considering sources positioned only in one of the three main planes: Horizontal, median, and frontal. Results suggest that traditional HRTF interpolation methods perform better than the proposed GAN-based one when the distance between measurements is smaller than 90°, but for the sparsest conditions (i.e., one measurement every 120°–180°), the proposed approach outperforms the others.

show abstract

Section: Preprocessingmentioning

confidence: 99%

Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study

Siripornpitak

Engel

Squires

et al. 2022

Front. Signal Process.

View full text Add to dashboard Cite

show abstract

“…Worst 0 training on the VR pointing procedure (similar to [17]) and investigate HRTF processing algorithms [26]. Finally, a high level of front-back reversal might have been caused by the visual environment dominating the perception when the non-individual auditory cues were vague: lack of visual source in the front of the listener led participants to believe that the sound must have come from their back (even if the subjects were informed that the sounds might come from any direction, both front and back).…”

Section: Individual Bestmentioning

confidence: 99%

Initial Evaluation of an Auditory-Model-Aided Selection Procedure for Non- Individual HRTFs

Daugintis,

Barumerli,

Geronazzo

et al. 2022

Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023

View full text Add to dashboard Cite

Binaural spatial audio reproduction systems use measured or simulated head-related transfer functions (HRTFs), which encode the effects of the outer ear and body on the incoming sound to recreate a realistic spatial auditory field around the listener. The sound localisation cues embedded in the HRTF are highly personal. Establishing perceptual similarity between different HRTFs in a reliable manner is challenging due to a combination of acoustic and non-acoustic aspects affecting our spatial auditory perception. To account for these factors, we propose an automated procedure to select the 'best' non-individual HRTF dataset from a pool of measured ones. For a group of human participants with their own acoustically measured HRTFs, a multi-feature Bayesian auditory sound localisation model is used to predict individual localisation performance with the other HRTFs from within the group. Then, the model selection of the 'best' and the 'worst' non-individual HRTFs is evaluated via an actual localisation test and a subjective audio quality assessment in comparison with individual HRTFs. A successful model-aided objective selection of the 'best' non-individual HRTF may provide relevant insights for effective and handy binaural spatial audio solutions in virtual/augmented reality (VR/AR) applications.

show abstract

“…Even though time is more intuitive than frequency to be modeled as the fourth dimension, time-based approach to HRTF modeling has been proven to be less accurate than the frequency-based one [20,35]. Furthermore, HRIRs include information on phase, which is widely acknowledged to be irrelevant to localization as long as ITD is preserved [8,36,37], although conflicting results have been presented regarding whether or not the phase linearization is detectable, particularly at low frequencies [16,38,39]. This work focuses only on the magnitude part of HRIR spectra, assuming that the phase can be either linearized basing on ITD or modeled independently.…”

Section: Discrete Hrir Data To 4d Basis Function Domainmentioning

confidence: 99%

“…Breebaart et al investigated parametrizing HRTFs on ERB scale, where for most positions around 20 parameters were needed to achieve perceptual irrelevance of the approximation [47]. On the other hand, Andreopoulou and Katz reported results, in which minimum-phase HRTFs for η = 32 were clearly discernible from full-phase 256-sample data [39]. This discrepancy might be caused by using prescreening, which ensured that the subjects were very competent for the task, with above-average hearing abilities.…”

Section: Space and Frequency Approximation Ordersmentioning

confidence: 99%

Optimization of piano tuning by means of spectral entropy minimization

Szwajcowski

Pilch

2020

Applied Acoustics

View full text Add to dashboard Cite

Perceptual Impact on Localization Quality Evaluations of Common Pre-Processing for Non-Individual Head-Related Transfer Functions

Cited by 7 publications

References 32 publications

Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study

Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study

Initial Evaluation of an Auditory-Model-Aided Selection Procedure for Non- Individual HRTFs

Optimization of piano tuning by means of spectral entropy minimization

Contact Info

Product

Resources

About