Associative relations among words, concepts and percepts are the core building blocks of high-level cognition. When viewing the world ‘at a glance’, the associative relations between objects in a scene, or between an object and its visual background, are extracted rapidly. The extent to which such relational processing requires attentional capacity, however, has been heavily disputed over the years. In the present manuscript, I review studies investigating scene–object and object–object associative processing. I then present a series of studies in which I assessed the necessity of spatial attention to various types of visual–semantic relations within a scene. Importantly, in all studies, the spatial and temporal aspects of visual attention were tightly controlled in an attempt to minimize unintentional attention shifts from ‘attended’ to ‘unattended’ regions. Pairs of stimuli—either objects, scenes or a scene and an object—were briefly presented on each trial, while participants were asked to detect a pre-defined target category (e.g., an animal, a nonsense shape). Response times (RTs) to the target detection task were registered when visual attention spanned both stimuli in a pair vs. when attention was focused on only one of two stimuli. Among non-prioritized stimuli that were not defined as to-be-detected targets, findings consistently demonstrated rapid associative processing when stimuli were fully attended, i.e., shorter RTs to associated than unassociated pairs. Focusing attention on a single stimulus only, however, largely impaired this relational processing. Notably, prioritized targets continued to affect performance even when positioned at an unattended location, and their associative relations with the attended items were well processed and analyzed. Our findings portray an important dissociation between unattended task-irrelevant and task-relevant items: while the former require spatial attentional resources in order to be linked to stimuli positioned inside the attentional focus, the latter may influence high-level recognition and associative processes via feature-based attentional mechanisms that are largely independent of spatial attention.