A speech–action-repository (SAR) or “mental syllabary” has been proposed as a central module for sensorimotor processing of syllables. In this approach, syllables occurring frequently within language are assumed to be stored as holistic sensorimotor patterns, while non-frequent syllables need to be assembled from sub-syllabic units. Thus, frequent syllables are processed efficiently and quickly during production or perception by a direct activation of their sensorimotor patterns. Whereas several behavioral psycholinguistic studies provided evidence in support of the existence of a syllabary, fMRI studies have failed to demonstrate its neural reality. In the present fMRI study a reaction paradigm using homogeneous vs. heterogeneous syllable blocks are used during overt vs. covert speech production and auditory vs. visual presentation modes. Two complementary data analyses were performed: (1) in a logical conjunction, activation for syllable processing independent of input modality and response mode was assessed, in order to support the assumption of existence of a supramodal hub within a SAR. (2) In addition priming effects in the BOLD response in homogeneous vs. heterogeneous blocks were measured in order to identify brain regions, which indicate reduced activity during multiple production/perception repetitions of a specific syllable in order to determine state maps. Auditory-visual conjunction analysis revealed an activation network comprising bilateral precentral gyrus (PrCG) and left inferior frontal gyrus (IFG) (area 44). These results are compatible with the notion of a supramodal hub within the SAR. The main effect of homogeneity priming revealed an activation pattern of areas within frontal, temporal, and parietal lobe. These findings are taken to represent sensorimotor state maps of the SAR. In conclusion, the present study provided preliminary evidence for a SAR.