When searching for varying targets in the environment, a target template has to be maintained in visual working memory (VWM). Recently, we showed that search-irrelevant features of a VWM template bias attention in an object-based manner, so that objects sharing such features with a VWM template capture the eyes involuntarily. Here, we investigated whether targetdistractor similarity modulates capture strength. Participants saccaded to a target accompanied by a distractor. A single feature (e.g., shape) defined the target in each trial indicated by a cue, and the cue also varied in one irrelevant feature (e.g., color). The distractor matched the cue's irrelevant feature in half of the trials. Nine experiments showed that target-distractor similarity consistently influenced the degree of oculomotor capture. High target-distractor dissimilarity in the search-relevant feature reduced capture by the irrelevant feature (Experiments 1, 3, 6, 7). However, capture was reduced by high target-distractor similarity in the search-irrelevant feature (Experiments 1, 4, 5, 8). Strong oculomotor capture was observed if target-distractor similarity was reasonably low in the relevant and high in the irrelevant feature, irrespective of whether color or shape were relevant (Experiments 2 and 5). These findings argue for involuntary and object-based, top-down control by VWM templates, whereas its manifestation in oculomotor capture depends crucially on target-distractor similarity in relevant and irrelevant feature dimensions of the search object.