Emotional attention, the boosting of the processing of emotionally relevant stimuli, has, up to now, mainly been investigated within a sensory modality, for instance, by using emotional pictures to modulate visual attention. In real-life environments, however, humans typically encounter simultaneous input to several different senses, such as vision and audition. As multiple signals entering different channels might originate from a common, emotionally relevant source, the prioritization of emotional stimuli should be able to operate across modalities. In this study, we explored cross-modal emotional attention. Spatially localized utterances with emotional and neutral prosody served as cues for a visually presented target in a cross-modal dot-probe task. Participants were faster to respond to targets that appeared at the spatial location of emotional compared to neutral prosody. Event-related brain potentials revealed emotional modulation of early visual target processing at the level of the P1 component, with neural sources in the striate visual cortex being more active for targets that appeared at the spatial location of emotional compared to neutral prosody. These effects were not found using synthesized control sounds matched for mean fundamental frequency and amplitude envelope. These results show that emotional attention can operate across sensory modalities by boosting early sensory stages of processing, thus facilitating the multimodal assessment of emotionally relevant stimuli in the environment.