Stimulus locations are detected differently by different sensory systems, but ultimately they yield similar percepts and behavioral responses. How the brain transcends initial differences to compute similar codes is unclear. We quantitatively compared the reference frames of two sensory modalities, vision and audition, across three interconnected brain areas involved in generating saccades, namely the frontal eye fields (FEF), lateral and medial parietal cortex (LIP/MIP), and superior colliculus (SC). We recorded from single neurons in head-restrained monkeys performing auditory-and visually-guided saccades from variable initial fixation locations, and evaluated whether their receptive fields were better described as eye-centered, head-centered, or hybrid (i.e. not anchored uniquely to head-or eye-orientation). We found a progression of reference frames across areas and across time, with considerable hybrid-ness and persistent differences between modalities during most epochs/brain regions. For both modalities, the SC was more eye-centered than the FEF, which in turn was more eye-centered than the predominantly hybrid LIP/MIP. In all three areas and temporal epochs from stimulus onset to movement, visual signals were more eye-centered than auditory signals. In the SC and FEF, auditory signals became more eye-centered at the time of the saccade than they were initially after stimulus onset, but only in the SC at the time of the saccade did the auditory signals become predominantly eye-centered. The results indicate that visual and auditory signals both undergo transformations, ultimately reaching the same final reference frame but via different dynamics across brain regions and time.
SIGNIFICANCE STATEMENTModels for visual-auditory integration posit that visual signals are eye-centered throughout the brain, while auditory signals are converted from head-centered to eyecentered coordinates. We show instead that both modalities largely employ hybrid reference frames: neither fully head-nor eye-centered. In three multimodal regions involved in orienting behaviors (Intraparietal Cortex, Frontal Eye Field and Superior Colliculus) these mixed codes persist in various proportions, shifting towards eyecenteredness both across time and across brain areas. Throughout, visual signals are more eye-centered than auditory signals, until a common predominantly eye-centered code for sound finally emerges during the saccade burst in the Superior Colliculus. In summary, visual and auditory signals reach the same final reference frame but via different dynamics across brain regions and time.