Reward value guides goal-directed behavior and modulates early sensory processing. Rewarding stimuli are often multisensory, but it is not known how reward value is combined across sensory modalities. Here we show that the integration of reward value critically depends on whether the distinct sensory inputs are perceived to emanate from the same multisensory object. We systematically manipulated the congruency in monetary reward values and the relative spatial positions of co-occurring auditory and visual stimuli that served as bimodal distractors during an oculomotor task performed by healthy human participants (male and female). The amount of interference induced by the distractors was used as an indicator of their perceptual salience. Our results across two experiments show that when reward value is linked to each modality separately, the value congruence between vision and audition determines the combined salience of the bimodal distractors. However, the reward value of vision wins over the value of audition if the two modalities are perceived to convey conflicting information regarding the spatial position of the bimodal distractors. These results show that in a task that highly relies on the processing of visual spatial information, the reward values from multiple sensory modalities are integrated with each other, each with their respective weights. This weighting depends on the strength of prior beliefs regarding a common source for incoming unisensory signals based on their congruency in reward value and perceived spatial alignment.
Significance StatementReal-world objects are typically multisensory, but it is not known how reward value is combined across sensory modalities. We examined how the eye movements toward a visual target are modulated by the reward value of audiovisual distractors. Our results show that in the face of uncertainty as to whether co-occurring visual and auditory inputs belong to the same object, congruence in their reward values is used to guide audiovisual integration. However, when a strong prior exists to assume that unisensory inputs do not emanate from the same object, the associative value of vision dominates over audition. These results demonstrate that our brain uses a reward-sensitive, flexible weighting mechanism to decide whether incoming sensory signals should be combined or not.