The label-feedback hypothesis states that language can modulate visual processing. In particular, hearing or reading aloud target names (labels) speeds up performance in visual search tasks by facilitating target detection and such advantage is often measured against a condition where the target name is shown visually (i.e. via the same modality as the search task). The current study conceptually complements and expands previous investigations. The effect of a multimodal label presentation (i.e., an audio+visual, AV, priming label) in a visual search task is compared to that of a multimodal (i.e. white noise+visual, NV, label) and two unimodal (i.e. audio, A, label or visual, V, label) control conditions. The name of a category (i.e. a label at the superordinate level) is used as a cue, instead of the more commonly used target name (a basic level label), with targets belonging to one of three categories: garments, improper weapons, and proper weapons. These categories vary for their structure, improper weapons being an ad hoc category (i.e. context-dependent), unlike proper weapons and garments. The preregistered analysis shows an overall facilitation of visual search performance in the AV condition compared to the NV condition, confirming that the label-feedback effect may not be explained away by the effects of multimodal stimulation only and that it extends to superordinate labels. Moreover, exploratory analyses show that such facilitation is driven by the garments and proper weapons categories, rather than improper weapons. Thus, the superordinate label-feedback effect is modulated by the structural properties of a category. These findings are consistent with the idea that the AV condition prompts an "up-regulation" of the label, a requirement for enhancing the label’s beneficial effects, but not when the label refers to an ad hoc category. They also highlight the peculiar status of the category of improper weapons and set it apart from that of proper weapons.