The integration of visual and auditory inputs in the human brain works properly only if the components are perceived in close temporal proximity. In the present study, we quantified cross-modal interactions in the human brain for audiovisual stimuli with temporal asynchronies, using a paradigm from rhythm perception. In this method, participants had to align the temporal position of a target in a rhythmic sequence of four markers. In the first experiment, target and markers consisted of a visual flash or an auditory noise burst, and all four combinations of target and marker modalities were tested. In the same-modality conditions, no temporal biases and a high precision of the adjusted temporal position of the target were observed. In the different-modality conditions, we found a systematic temporal bias of 25-30 ms. In the second part of the first and in a second experiment, we tested conditions in which audiovisual markers with different stimulus onset asynchronies (SOAs) between the two components and a visual target were used to quantify temporal ventriloquism. The adjusted target positions varied by up to about 50 ms and depended in a systematic way on the SOA and its proximity to the point of subjective synchrony. These data allowed testing different quantitative models. The most satisfying model, based on work by Maij, Brenner, and Smeets (Journal of Neurophysiology 102, 490-495, 2009), linked temporal ventriloquism and the percept of synchrony and was capable of adequately describing the results from the present study, as well as those of some earlier experiments.