In two experiments, we studied the temporal dynamics of feature integration with auditory (Experiment 1) and audiovisual (Experiment 2) stimuli and manual responses. Consistent with previous observations, performance was better when the second of two consecutive stimuli shared all or none of the features of the first, rather than when only one of the features overlapped. Comparable partial-overlap costs were obtained for combinations of stimulus features and responses. These effects decreased systematically with increasing time between the two stimulus-and-response events, and the decreased rate was comparable for unimodal and multimodal bindings. General effect size reflected the degree of task relevance of the dimension or modality of the respective feature, but the effects of relevance and of temporal delay did not interact. This suggests that the processing of stimuli on task-relevant sensory modalities and feature dimensions is facilitated by task-specific attentional sets, whereas the temporal dynamics might reflect that bindings "decay" or become more difficult to access over time.