An ability to detect the common location of multisensory stimulation is essential for us to perceive a coherent environment, to represent the interface between the body and the external world, and to act on sensory information. Regarding the tactile environment "at hand", we need to represent somatosensory stimuli impinging on the skin surface in the same spatial reference frame as distal stimuli, such as those transduced by vision and audition. Across two experiments we investigated whether 6- (n = 14; Experiment 1) and 4-month-old (n = 14; Experiment 2) infants were sensitive to the colocation of tactile and auditory signals delivered to the hands. We recorded infants' visual preferences for spatially congruent and incongruent auditory-tactile events delivered to their hands. At 6 months, infants looked longer toward incongruent stimuli, whilst at 4 months infants looked longer toward congruent stimuli. Thus, even from 4 months of age, infants are sensitive to the colocation of simultaneously presented auditory and tactile stimuli. We conclude that 4- and 6-month-old infants can represent auditory and tactile stimuli in a common spatial frame of reference. We explain the age-wise shift in infants' preferences from congruent to incongruent in terms of an increased preference for novel crossmodal spatial relations based on the accumulation of experience. A comparison of looking preferences across the congruent and incongruent conditions with a unisensory control condition indicates that the ability to perceive auditory-tactile colocation is based on a crossmodal rather than a supramodal spatial code by 6 months of age at least.