Malaria mosquitoes mate in swarms. Here, they must rely on multiple sensory cues in shaping their individual responses, such as during mate recognition, swarm maintenance, and collision avoidance. While male mosquitoes are known to use faint female flight tones for recognizing their mates, the role of other sensory modalities remains less explored. By combining free-flight and tethered flight simulator experiments with Anopheles coluzzii, we demonstrate that swarming mosquitoes integrate visual and acoustic information to track conspecifics and avoid collisions. In tethered experiments, acoustic stimuli gated male steering responses to visual objects simulating nearby female mosquitoes, whereas visual cues alone triggered changes in wingbeat amplitude and frequency. Free-flight experiments show that mosquitoes modulate their flight responses to nearby conspecifics similarly to tethered animals, allowing for collision avoidance within swarms. These findings suggest that combined visual and acoustic information contributes to conspecific recognition within swarms, and, for males, permits female tracking while avoiding collisions.