Simultaneous masking refers to the impairment of performance on a visual target by simultaneously presented flankers. Whereas the spatial aspects of simultaneous masking have been studied extensively, the time course of these spatial influences is much less well understood. We here measure response latency and accuracy in a simultaneous masking paradigm and apply event history analysis to study the time course of target-flanker interactions. In our experiments, we presented a central target vernier flanked on both sides by 12 aligned distractor verniers that were either shorter, longer, or equal in length (Experiment 1), and that also were congruent or incongruent in their spatial offset with the target (Experiment 2). Response time distributions showed that there were more fast responses when the target was flanked by short flankers. Conditional accuracy functions showed that accuracy of responses dropped when the flankers had the same length as the target, but only for slow responses. These results are at odds with accounts based solely on lateral neural interactions or response competition, and instead suggest that top-down visual object-to-feature interference occurs when the target is not selected fast enough, congruent with object substitution theory.