1The ability to use temporal relationships between cross-modal cues facilitates perception and 2 behavior. Previously we observed that temporally correlated changes in the size of a visual 3 stimulus and the intensity in an auditory stimulus influenced the ability of listeners to perform an 4 auditory selective attention task (Maddox et al., 2015). In this task participants detected timbral 5 changes in a target sound while ignoring those in a simultaneously presented masker. When the 6 visual stimulus was temporally coherent with the target sound, performance was significantly 7 better than when it was temporally coherent with the masker sound, despite the visual stimulus 8 conveying no task-relevant information. Here, we trained observers to detect audiovisual 9 temporal coherence and asked whether this improved their ability to benefit from visual cues 10 during the auditory selective attention task. We observed these listeners improved performance 11 in the auditory selective attention task and changed the way in which they benefited from a 12 visual stimulus: after training performance was better when the visual stimulus was temporally 13 coherent with either the target or the masker stream, relative to the condition in which the visual 14 stimulus was coherent with neither auditory stream. A second group which trained to 15 discriminate modulation rate differences between temporally coherent audiovisual streams 16 improved task performance, but did not change the way in which they used visual information. A 17 control group did not change their performance between pretest and post-test. These results 18 provide insights into how crossmodal experience may optimize multisensory integration. 19 20 Keywords: audiovisual integration, selective attention, auditory scene analysis, temporal 21 processing, training 22 A problem in interpreting these varied effects is that many lab-based tasks do not well capture 2 the complexity that the brain faces in real-world situations. In most lab-based paradigms 3 observers often judge single audio and visual signals, presented in an otherwise quiet and dark 4 environment. In contrast, in the world, brain must match one of several competing sounds to a 5 given visual object (or vice versa). Moreover, due to the variance in the timing of real-world 6 temporal coherence of auditory and visual streams. Prior to training, participants showed the expected effect of coherence condition (i.e. only target coherent > masker coherent). However,