The middle temporal gyrus (MTG) has been shown to be recruited during the processing of words, but also during the observation of actions. Here we investigated how information related to words and gestures is organized along the MTG. To this aim, we measured the BOLD response in the MTG to video clips of gestures and spoken words in 17 healthy human adults (male and female). Gestures consisted of videos of an actress performing object-use pantomimes (iconic representations of object-directed actions; e.g., playing guitar), emblems (conventional gestures, e.g., thumb up), and meaningless gestures. Word stimuli (verbs, nouns) consisted of video clips of the same actress pronouncing words. We found a stronger response to meaningful compared with meaningless gestures along the whole left and large portions of the right MTG. Importantly, we observed a gradient, with posterior regions responding more strongly to gestures (pantomimes and emblems) than words and anterior regions showing a stronger response to words than gestures. In an intermediate region in the left hemisphere, the response was significantly higher to words and emblems (i.e., items with a greater arbitrariness of the sign-to-meaning mapping) than to pantomimes. These results show that the large-scale organization of information in the MTG is driven by the input modality and may also reflect the arbitrariness of the relationship between sign and meaning.Here we investigated the organizing principle of information in the middle temporal gyrus, taking into consideration the inputmodality and the arbitrariness of the relationship between a sign and its meaning. We compared the middle temporal gyrus response during the processing of pantomimes, emblems, and spoken words. We found that posterior regions responded more strongly to pantomimes and emblems than to words, whereas anterior regions responded more strongly to words than to pantomimes and emblems. In an intermediate region, only in the left hemisphere, words and emblems evoked a stronger response than pantomimes. Our results identify two organizing principles of neural representation: the modality of communication (gestural or verbal) and the (arbitrariness of the) relationship between sign and meanings.