In complex scenes, the identity of an auditory object can build up across seconds. Given that attention operates on perceptual objects, this perceptual buildup may alter the efficacy of selective auditory attention over time. Here, we measured identification of a sequence of spoken target digits presented with distracter digits from other directions to investigate the dynamics of selective attention. Performance was better when the target location was fixed rather than changing between digits, even when listeners were cued as much as 1 s in advance about the position of each subsequent digit. Spatial continuity not only avoided well known costs associated with switching the focus of spatial attention, but also produced refinements in the spatial selectivity of attention across time. Continuity of target voice further enhanced this buildup of selective attention. Results suggest that when attention is sustained on one auditory object within a complex scene, attentional selectivity improves over time. Similar effects may come into play when attention is sustained on an object in a complex visual scene, especially in cases where visual object formation requires sustained attention.source segregation ͉ auditory scene analysis ͉ spatial hearing ͉ streaming ͉ auditory mixture I n everyday situations, we are confronted with multiple objects that compete for our attention. Both stimulus-driven and goal-related mechanisms mediate the between-object competition to determine what will be brought to the perceptual foreground (1, 2). In natural scenes, objects come and go and the object of interest can change from moment to moment, such as when the flow of conversation shifts from one talker to another at a party. Thus, our ability to analyze objects in everyday settings is directly affected by how switching attention between objects affects perception. Much of what we know about the effects of switching attention comes from visual experiments in which observers monitor rapid sequences of images or search for an item in a static field of objects (3, 4). Although these situations give insight into the time it takes to dis-and reengage attention from one object to the next, they do not directly explore whether there are dynamic effects of sustaining attention on one object through time.In contrast to visual objects, the identity of an auditory object is intimately linked to how the content of a sound evolves over time. Moreover, the process of forming an auditory object is known to evolve over seconds (5)(6)(7)(8). Given that attention is object-based (9, 10), this refinement in object formation may directly impact the selectivity of attention in a complex auditory scene. Specifically, sustaining attention on one object in a complex scene may yield more refined selectivity to the attended object over time. In turn, switching attention to a new object may reset object formation and therefore reset attentional selectivity. If so, the cost of switching attention between objects may not only be related to the time required to dis-...
Binaural room impulse responses (BRIRs) were measured in a classroom for sources at different azimuths and distances (up to 1 m) relative to a manikin located in four positions in a classroom. When the listener is far from all walls, reverberant energy distorts signal magnitude and phase independently at each frequency, altering monaural spectral cues, interaural phase differences, and interaural level differences. For the tested conditions, systematic distortion (comb-filtering) from an early intense reflection is only evident when a listener is very close to a wall, and then only in the ear facing the wall. Especially for a nearby source, interaural cues grow less reliable with increasing source laterality and monaural spectral cues are less reliable in the ear farther from the sound source. Reverberation reduces the magnitude of interaural level differences at all frequencies; however, the direct-sound interaural time difference can still be recovered from the BRIRs measured in these experiments. Results suggest that bias and variability in sound localization behavior may vary systematically with listener location in a room as well as source location relative to the listener, even for nearby sources where there is relatively little reverberant energy.
Seeing the image of a newscaster on a television set causes us to think that the sound coming from the loudspeaker is actually coming from the screen. How images capture sounds is mysterious because the brain uses different methods for determining the locations of visual versus auditory stimuli. The retina senses the locations of visual objects with respect to the eyes, whereas differences in sound characteristics across the ears indicate the locations of sound sources referenced to the head. Here, we tested which reference frame (RF) is used when vision recalibrates perceived sound locations. Visually guided biases in sound localization were induced in seven humans and two monkeys who made eye movements to auditory or audiovisual stimuli. On audiovisual (training) trials, the visual component of the targets was displaced laterally by 5-6°. Interleaved auditory-only (probe) trials served to evaluate the effect of experience with mismatched visual stimuli on auditory localization. We found that the displaced visual stimuli induced ventriloquism aftereffect in both humans (ϳ50% of the displacement size) and monkeys (ϳ25%), but only for locations around the trained spatial region, showing that audiovisual recalibration can be spatially specific. We tested the reference frame in which the recalibration occurs. On probe trials, we varied eye position relative to the head to dissociate head-from eye-centered RFs. Results indicate that both humans and monkeys use a mixture of the two RFs, suggesting that the neural mechanisms involved in ventriloquism occur in brain region(s) using a hybrid RF for encoding spatial information.
The effects of stimulus frequency and bandwidth on distance perception were examined for nearby sources in simulated reverberant space. Sources to the side [containing reverberation-related cues and interaural level difference (ILD) cues] and to the front (without ILDs) were simulated. Listeners judged the distance of noise bursts presented at a randomly roving level from simulated distances ranging from 0.15 to 1.7 m. Six stimuli were tested, varying in center frequency (300-5700 Hz) and bandwidth (200-5400 Hz). Performance, measured as the correlation between simulated and response distances, was worse for frontal than for lateral sources. For both simulated directions, performance was inversely proportional to the low-frequency stimulus cutoff, independent of stimulus bandwidth. The dependence of performance on frequency was stronger for frontal sources. These correlation results were well summarized by considering how mean response, as opposed to response variance, changed with stimulus direction and spectrum: (1) little bias was observed for lateral sources, but listeners consistently overestimated distance for frontal nearby sources; (2) for both directions, increasing the low-frequency cut-off reduced the range of responses. These results are consistent with the hypothesis that listeners used a direction-independent but frequency-dependent mapping of a reverberation-related cue, not the ILD cue, to judge source distance.
To a first-order approximation, binaural localization cues are ambiguous: many source locations give rise to nearly the same interaural differences. For sources more than a meter away, binaural localization cues are approximately equal for any source on a cone centered on the interaural axis (i.e., the well-known "cone of confusion"). The current paper analyzes simple geometric approximations of a head to gain insight into localization performance for nearby sources. If the head is treated as a rigid, perfect sphere, interaural intensity differences (IIDs) can be broken down into two main components. One component depends on the head shadow and is constant along the cone of confusion (and covaries with the interaural time difference, or ITD). The other component depends only on the relative path lengths from the source to the two ears and is roughly constant for a sphere centered on the interaural axis. This second factor is large enough to be perceptible only when sources are within one or two meters of the listener. Results are not dramatically different if one assumes that the ears are separated by 160 deg along the surface of the sphere (rather than diametrically opposite one another). Thus for nearby sources, binaural information should allow listeners to locate sources within a volume around a circle centered on the interaural axis on a "torus of confusion." The volume of the torus of confusion increases as the source approaches the median plane, degenerating to a volume around the median plane in the limit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.