To interpret the world and make accurate perceptual decisions, the brain must combine information across sensory modalities. For instance, it must combine vision and hearing to localize objects based on their image and sound. Probability theory suggests that evidence from multiple independent cues should be combined additively, but it is unclear whether mice and other mammals do this, and the cortical substrates of multisensory integration are uncertain. Here we show that to localize a stimulus mice combine auditory and visual spatial cues additively, a computation supported by unisensory processing in auditory and visual cortex and additive multisensory integration in frontal cortex. We developed an audiovisual localization task where mice turn a wheel to indicate the joint position of an image and a sound. Scanning optogenetic inactivation of dorsal cortex showed that auditory and visual areas contribute unisensory information, whereas frontal cortex (secondary motor area, MOs) contributes multisensory information to the decision of the mouse. Neuropixels recordings of >10,000 neurons indicated that neural activity in MOs reflects an additive combination of visual and auditory signals. An accumulator model applied to the sensory representations of MOs neurons reproduced behaviourally observed choices and reaction times. This suggests that MOs integrates information from multiple sensory cortices, providing a signal that is then transformed into a binary decision by a downstream accumulator.