The human brain processes different aspects of the surrounding environment through multiple sensory modalities, and each modality can be subdivided into multiple attribute-specific channels. When the brain rebinds sensory content information ('what') across different channels, temporal coincidence ('when') along with spatial coincidence ('where') provides a critical clue. It however remains unknown whether neural mechanisms for binding synchronous attributes are specific to each attribute combination, or universal and central. In human psychophysical experiments, we examined how combinations of visual, auditory and tactile attributes affect the temporal frequency limit of synchrony-based binding. The results indicated that the upper limits of cross-attribute binding were lower than those of withinattribute binding, and surprisingly similar for any combination of visual, auditory and tactile attributes (2 -3 Hz). They are unlikely to be the limits for judging synchrony, since the temporal limit of a crossattribute synchrony judgement was higher and varied with the modality combination (4 -9 Hz). These findings suggest that cross-attribute temporal binding is mediated by a slow central process that combines separately processed 'what' and 'when' properties of a single event. While the synchrony performance reflects temporal bottlenecks existing in 'when' processing, the binding performance reflects the central temporal limit of integrating 'when' and 'what' properties.