Responding to a stimulus leads to the integration of response and stimulus’ features into an event file. Upon repetition of any of its features, the previous event file is retrieved, thereby affecting ongoing performance. Such integration-retrieval explanations exist for a number of sequential tasks (that measure these processes as ’binding effects’) and are thought to underlie all actions. However, based on attentional orienting literature, Schöpper, Hilchey, et al. (2020) could show that binding effects are absent when participants detect visual targets in a sequence: In visual detection performance, there is simply a benefit for target location changes (inhibition of return). In contrast, Mondor and Leboe (2008) had participants detect auditory targets in a sequence, and found a benefit for frequency repetition – presumably reflecting a binding effect in auditory detection performance. In the current study, we conducted two experiments, that only differed in the modality of the target: Participants signaled the detection of a sound (N = 40) or of a visual target (N = 40). Whereas visual detection performance showed a pattern incongruent with binding assumptions, auditory detection performance revealed a non-spatial feature repetition benefit, suggesting that frequency was bound to the response. Cumulative reaction time distributions indicated that the absence of a binding effect in visual detection performance was not caused by overall faster responding. The current results show a clear limitation to binding accounts in action control: Binding effects are not only limited by task demands, but can entirely depend on target modality.