A recent development in human-computer interfaces is the virtual acoustic display, a device that synthesizes three-dimensional, spatial auditory information over headphones using digital filters constructed from head-related transfer functions (HRTFs). The utility of such a display depends on the accuracy with which listeners can localize virtual sound sources. A previous study [F. L. Wightman and D. J. Kistler, J. Acoust. Soc. Am. 85, 868-878 (1989)] observed accurate localization by listeners for free-field sources and for virtual sources generated from the subjects' own HRTFs. In practice, measurement of the HRTFs of each potential user of a spatial auditory display may not be feasible. Thus, a critical research question is whether listeners can obtain adequate localization cues from stimuli based on nonindividualized transforms. Here, inexperienced listeners judged the apparent direction (azimuth and elevation) of wideband noisebursts presented in the free-field or over headphones; headphone stimuli were synthesized using HRTFs from a representative subject of Wightman and Kistler. When confusions were resolved, localization of virtual sources was quite accurate and comparable to the free-field sources for 12 of the 16 subjects. Of the remaining subjects, 2 showed poor elevation accuracy in both stimulus conditions, and 2 showed degraded elevation accuracy with virtual sources. Many of the listeners also showed high rates of front-back and up-down confusions that increased significantly for virtual sources compared to the free-field stimuli. These data suggest that while the interaural cues to horizontal location are robust, the spectral cues considered important for resolving location along a particular cone-of-confusion are distorted by a synthesis process that uses nonindividualized HRTFs.
The aim of most attempts to synthesize a virtual auditory environment is duplication of the acoustical features of a real auditory environment. In other words, the sound pressure waveforms reaching a listener’s eardrums in the virtual world should be the same as they would have been in a real world. This goal is currently reachable only in certain laboratory conditions. Thus, production of usable virtual sound systems involves a number of compromises. For example, error-free synthesis of an auditory object at an arbitrary point in space requires knowledge of the free-field-to-eardrum transfer functions (HRTFs) for both ears at all sound incidence angles. Since it is impossible to measure HRTFs at all sound incidence angles, some interpolation is required, and how best to accomplish it is a difficult question. In addition, HRTFs vary considerably from listener to listener, but it is not feasible to measure HRTFs from each potential use of a virtual sound system. Either a single HRTF must be chosen or some kind of average HRTF computed. Interpolation and use of nonindividualized HRTFs are two of the many compromises that must be made in order to produce a usable virtual sound system. These compromises produce error, the significance of which can only be assessed in psychophysical experiments. The experiments described here require listeners to judge the apparent positions of virtual auditory objects that are synthesized so that the error introduced by interpolation and other compromises is systematically manipulated. The perceptual consequences of the manipulations are evaluated by examining the variance in apparent position judgments, the discrepancy between apparent position and intended position, and the frequency of front-back and up-down confusions. [Work supported by NIH and NASA.]
There are numerous reports in the psychoacoustical literature that human listeners can localize sound sources reasonably well with one ear. Since interaural difference cues are presumably eliminated in monaural conditions, the so-called “monaural spectral cues” introduced by pinna filtering are assumed to provide the information necessary for accurate localization in monaural conditions. In these experiments, listeners localize wideband noise bursts presented either in free-field (with one ear occluded) or via headphones (with the signal to one phone either attenuated or disconnected). In the headphone conditions, pinna filtering effects are added digitally, such that the waveforms at a listener's eardrum are nearly the same as those produced by free-field sources. The noise spectrum is scrambled from trial to trial to prevent learning. With the noise bursts presented at about 30 dB SL in free-field, the results are consistent with other recent reports and suggest that some ability to localize, especially in the vertical direction, is retained in monaural conditions. However, when the identical stimuli are presented via headphones, there is no indication that sources can be localized monaurally. In other conditions, listeners localize constant spectrum stimuli, free-field stimuli at 70 dB SL, and binaural headphone stimuli with one ear attenuated. Results from these conditions suggest that monaural localization in free-field is most likely mediated by small head movements, a priori knowledge of the stimulus spectrum, and acoustical leakage through the ear-occluding devices used to monauralize the listeners. [Work supported by NIH and NASA.]
In a series of localization experiments, blindfolded listeners were asked to give the apparent azimuth and elevation coordinates of 250-ms broadband noise bursts. The noise bursts were spectrally shaped so that the spectrum level in each critical band was set randomly (from trial to trial) within a 20-dB range. In a reference condition, the sounds were transduced by miniature loudspeakers placed at 36 locations in an anechoic chamber. Performance in this free-field condition agreed with previous results in the literature except for an increased frequency of frontback reversals, which is felt to be the result of the lack of a visual frame of reference. In a control condition, digital techniques were used to synthesize headphone-presented stimuli that were nearly identical, as measured in listeners' ear canals, to the free-field stimuli. Localization performance with these stimuli was virtually the same as in free field, thus verifying the adequacy of the simulation. The digital filters used to synthesize the stimuli were then modified such that in one condition the interaural time differences were removed from all 36 stimuli, and in another condition the interaural amplitude differences were removed. In both of these conditions, localization performance was degraded. However, preliminary results suggest that the degradation caused by removal of interaural time differences is considerably greater than that caused by removal of interaural intensity differences. [Work supported by NIH, NSF, and NASA.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.