Binaural systems are a promising class of three-dimensional (3D) auditory displays for high-definition personal 3D audio devices. They properly synthesize the sound pressure signals at the ears of a listener, namely binaural signals, by means of the head-related transfer functions (HRTFs). Rigid spherical microphone arrays (RSMAs) are widely used to capture sound pressure fields for binaural presentation to multiple listeners. However, the spatial resolution needed in the RSMAs to allow for accurate binaural reproduction has not been studied in detail. The aim of this paper is to objectively address this question. We evaluated the spatial accuracy in binaural signals synthesized from the recordings of RSMAs with different number of microphones using the model of a human head. We find that the synthesis of spectral cues is accurate up to a maximum frequency determined by the number of microphones. Nevertheless, we also identify a limit beyond which adding more microphones does not improve overall accuracy. Said limit is higher for the interaural spectral cues than for the monaural ones.
The spherical harmonic decomposition can be applied to present realistically localized sound sources over headphones. The acoustic field, measured by a spherical microphone array, is first decomposed into a weighted sum of spherical harmonics evaluated at the microphone positions. The resulting decomposition is used to generate a set of virtual sources at various angles. The virtual sources are thus binaurally presented by applying the corresponding head-related transfer functions (HRTF). Reproduction accuracy is heavily dependent on the spatial distribution of microphones and virtual sources. Nearly uniform sphere samplings are used in positioning the microphones so as to improve spatial accuracy. However, no previous studies have looked into the optimal arrangement for the virtual sources. We evaluate the effects of the virtual source distribution on the accuracy of the synthesized HRTF. Furthermore, our study considers the impact of spatial aliasing for a 252-channel spherical microphone array. The microphone's body is modeled as a human-head-sized rigid sphere. We evaluate the synthesis error by comparison with the target HRTF using the logarithmic spectral distance. Our study finds that 362 virtual sources, distributed on an icosahedral grid, can synthesize the HRTF in the horizontal plane up to 9 kHz with a log-spectral distance below 5 dB.
Spatial descriptions of the head-related transfer function (HRTF) using spherical harmonics, which is commonly used for the purpose, consider all directions simultaneously. However, in perceptual studies, it is necessary to model HRTFs with different angular resolutions at different directions. To this end, an alternative spatial representation of the HRTF, based on local analysis functions, is introduced. The proposal is shown to have the potential to describe the local features of the HRTF. This is verified by comparing the reconstruction error achieved by the proposal to that of the spherical harmonic decomposition when reconstructing the HRTF inside a spherical cap.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.