Visual capture and the ventriloquism aftereffect resolve spatial disparities of incongruent auditory-visual (AV) objects by shifting auditory spatial perception to align with vision. Here, we demonstrated the distinct temporal characteristics of visual capture and the ventriloquism aftereffect in response to brief AV disparities. In a set of experiments, subjects localized either the auditory component of AV targets (A within AV) or a second sound presented at varying delays (1-20s) after AV exposure (A2 after AV). AV targets were trains of brief presentations (1 or 20), covering a ±30° azimuthal range, and with ±8° (R or L) disparity. We found that the magnitude of visual capture generally reached its peak within a single AV pair and did not dissipate with time, while the ventriloquism aftereffect accumulated with repetitions of AV pairs and dissipated with time. Additionally, the magnitude of the auditory shift induced by each phenomenon was uncorrelated across listeners and visual capture was unaffected by subsequent auditory targets, indicating that visual capture and the ventriloquism aftereffect are separate mechanisms with distinct effects on auditory spatial perception. Our results indicate that visual capture is a ‘sample-and-hold’ process that binds related objects and stores the combined percept in memory, whereas the ventriloquism aftereffect is a ‘leaky integrator’ process that accumulates with experience and decays with time to compensate for cross-modal disparities.
The ventriloquism aftereffect (VAE) refers to a shift in auditory spatial perception following exposure to a spatial disparity between auditory and visual stimuli. The VAE has been previously measured on two distinct time scales. Hundreds or thousands of exposures to a an audio-visual spatial disparity produces enduring VAE that persists after exposure ceases. Exposure to a single audio-visual spatial disparity produces immediate VAE that decays over seconds. To determine if these phenomena are two extremes of a continuum or represent distinct processes, we conducted an experiment with normal hearing listeners that measured VAE in response to a repeated, constant audio-visual disparity sequence, both immediately after exposure to each audio-visual disparity and after the end of the sequence. In each experimental session, subjects were exposed to sequences of auditory and visual targets that were constantly offset by +8° or −8° in azimuth from one another, then localized auditory targets presented in isolation following each sequence. Eye position was controlled throughout the experiment, to avoid the effects of gaze on auditory localization. In contrast to other studies that did not control eye position, we found both a large shift in auditory perception that decayed rapidly after each AV disparity exposure, along with a gradual shift in auditory perception that grew over time and persisted after exposure to the AV disparity ceased. We modeled the temporal and spatial properties of the measured auditory shifts using grey box nonlinear system identification, and found that two models could explain the data equally well. In the power model, the temporal decay of the ventriloquism aftereffect was modeled with a power law relationship. This causes an initial rapid drop in auditory shift, followed by a long tail which accumulates with repeated exposure to audio-visual disparity. In the double exponential model, two separate processes were required to explain the data, one which accumulated and decayed exponentially and the other which slowly integrated over time. Both models fit the data best when the spatial spread of the ventriloquism aftereffect was limited to a window around the location of the audio-visual disparity. We directly compare the predictions made by each model, and suggest additional measurements that could help distinguish which model best describes the mechanisms underlying the VAE.
What role do general-purpose, experience-sensitive perceptual mechanisms play in producing characteristic features of face perception? We previously demonstrated that different-colored, misaligned framing backgrounds, designed to disrupt perceptual grouping of face parts appearing upon them, disrupt holistic face perception. In the current experiments, a similar part-judgment task with composite faces was performed: face parts appeared in either misaligned, differentcolored rectangles or aligned, same-colored rectangles. To investigate whether experience can shape impacts of perceptual grouping on holistic face perception, a pre-task fostered the perception of either (a) the misaligned, differently colored rectangle frames as parts of a single, multicolored polygon or (b) the aligned, same-colored rectangle frames as a single square shape. Faces appearing in the misaligned, differently colored rectangles were processed more holistically by those in the polygon-, compared with the square-, pre-task group. Holistic effects for faces appearing in aligned, same-colored rectangles showed the opposite pattern. Experiment 2, which included a pre-task condition fostering the perception of the aligned, samecolored frames as pairs of independent rectangles, provided converging evidence that experience can modulate impacts of perceptual grouping on holistic face perception. These results are surprising given the proposed impenetrability of holistic face perception and provide insights into the elusive mechanisms underlying holistic perception.
Vision typically has better spatial accuracy and precision than audition, and as a result often captures auditory spatial perception when visual and auditory cues are presented together. One determinant of visual capture is the amount of spatial disparity between auditory and visual cues: when disparity is small visual capture is likely to occur, and when disparity is large visual capture is unlikely. Previous experiments have used two methods to probe how visual capture varies with spatial disparity. First, congruence judgment assesses perceived unity between cues by having subjects report whether or not auditory and visual targets came from the same location. Second, auditory localization assesses the graded influence of vision on auditory spatial perception by having subjects point to the remembered location of an auditory target presented with a visual target. Previous research has shown that when both tasks are performed concurrently they produce similar measures of visual capture, but this may not hold when tasks are performed independently. Here, subjects alternated between tasks independently across three sessions. A Bayesian inference model of visual capture was used to estimate perceptual parameters for each session, which were compared across tasks. Results demonstrated that the range of audio-visual disparities over which visual capture was likely to occur were narrower in auditory localization than in congruence judgment, which the model indicates was caused by subjects adjusting their prior expectation that targets originated from the same location in a task-dependent manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.