Recalibration of perceived simultaneity has been widely accepted to minimise delay between multisensory signals owing to different physical and neural conduct times. With concurrent exposure, temporal recalibration is either contextually or spatially based. Context-based recalibration was recently described in detail, but evidence for space-based recalibration is scarce. In addition, the competition between these two reference frames is unclear. Here, we examined participants who watched two distinct blob-and-tone couples that laterally alternated with one asynchronous and the other synchronous and then judged their perceived simultaneity and sequence when they swapped positions and varied in timing. For low-level stimuli with abundant auditory location cues space-based aftereffects were significantly more apparent (8.3%) than context-based aftereffects (4.2%), but without such auditory cues space-based aftereffects were less apparent (4.4%) and were numerically smaller than context-based aftereffects (6.0%). These results suggested that stimulus level and auditory location cues were both determinants of the recalibration frame. Through such joint judgments and the simple reaction time task, our results further revealed that criteria from perceived simultaneity to successiveness profoundly shifted without accompanying perceptual latency changes across adaptations, hence implying that criteria shifts, rather than perceptual latency changes, accounted for space-based and context-based temporal recalibration.