The temporal dynamics of brain activation during visual and auditory perception of congruent vs. incongruent musical video clips was investigated in 12 musicians from the Milan Conservatory of music and 12 controls. 368 videos of a clarinetist and a violinist playing the same score with their instruments were presented. The sounds were similar in pitch, intensity, rhythm and duration. To produce an audiovisual discrepancy, in half of the trials, the visual information was incongruent with the soundtrack in pitch. ERPs were recorded from 128 sites. Only in musicians for their own instruments was a N400-like negative deflection elicited due to the incongruent audiovisual information. SwLORETA applied to the N400 response identified the areas mediating multimodal motor processing: the prefrontal cortex, the right superior and middle temporal gyrus, the premotor cortex, the inferior frontal and inferior parietal areas, the EBA, somatosensory cortex, cerebellum and SMA. The data indicate the existence of audiomotor mirror neurons responding to incongruent visual and auditory information, thus suggesting that they may encode multimodal representations of musical gestures and sounds. These systems may underlie the ability to learn how to play a musical instrument.