Synchronizing motoric responses to metrical sensory rhythms is key to social activities, e.g., group singing and dancing. It remains elusive, however, whether there is a common neural network for motoric synchronization to metrical rhythms from different sensory modalities. Here, we separate sensorimotor responses from basic sensory responses by combining a metrical sensorimotor synchronization task with frequency-domain magnetoencephalography (MEG) analysis. A common frontal-temporal network, not including visual cortex, is observed during both visual- and auditory-motor synchronization, and the network remains in congenitally deaf participants during visual-motor synchronization, suggesting the network is formed by intrinsic cortical connections instead of auditory experience. Furthermore, activation of the left and right frontal-temporal areas, as well as the ipsilateral white matter connection, separately predict the precision of auditory and visual synchronization. These results reveal a common but lateralized frontal-temporal network for visual- and auditory-motor synchronization, which is generated based on intrinsic cortical connections.