To interact with our dynamic environment, the brain merges motion information from auditory and visual senses. However, not only "natural" auditory MOTION, but also "metaphoric" de/ascending PITCH and SPEECH (e.g., "left/right"), influence the visual motion percept. Here, we systematically investigate whether these three classes of direction signals influence visual motion perception through shared or distinct neural mechanisms. In a visual-selective attention paradigm, subjects discriminated the direction of visual motion at several levels of reliability, with an irrelevant auditory stimulus being congruent, absent, or incongruent. Although the natural, metaphoric, and linguistic auditory signals were equally long and adjusted to induce a comparable directional bias on the motion percept, they influenced visual motion processing at different levels of the cortical hierarchy. A significant audiovisual interaction was revealed for MOTION in left human motion complex (hMTϩ/V5ϩ) and for SPEECH in right intraparietal sulcus. In fact, the audiovisual interaction gradually decreased in left hMTϩ/V5ϩ for MOTION Ͼ PITCH Ͼ SPEECH and in right intraparietal sulcus for SPEECH Ͼ PITCH Ͼ MOTION. In conclusion, natural motion signals are integrated in audiovisual motion areas, whereas the influence of culturally learnt signals emerges primarily in higher-level convergence regions.