Audio rendering is generally used to increase the realism of virtual environments (VE). In addition, audio rendering may also improve the performance in specific tasks carried out in interactive applications such as games or simulators.In this article we investigate the effect of the quality of sound rendering on task performance in a task which is inherently vision-dominated. The task is a virtual traffic gap-crossing scenario with two elements: first, to discriminate crossable and uncrossable gaps in oncoming traffic, and second, to find the right timing to start crossing the street without an accident. A study was carried out with 48 participants in an immersive virtual environment setup with a large screen and headphones. Participants were grouped into three different scenarios. In the first one, spatialized audio rendering with head-related transfer function (HRTF) filtering was used. The second group was tested with conventional stereo rendering, and the remaining group ran the experiment in a mute condition. Our results give a clear evidence that spatialized audio improves task performance compared to the unimodal mute condition. Since all task-relevant information was in the participants' field-of-view, we conclude that an enhancement of task performance results from a bimodal advantage due to the integration of visual and auditory spatial cues.