Traditional research on attentional control has largely focused on single senses and the importance of one s behavioural goals in controlling attentional selection, thus limiting its generalizability to real-world contexts. These contexts are both inherently multisensory and contain regularities that also contribute to attentional control. To get a better understanding of how attention is controlled in the real world, we investigated how visual attentional capture was impacted by top-down goals (indexed by task-set contingent attentional capture) and the multisensory nature of stimuli, as well as top-down contextual factors such as semantic relationships and temporal predictability of stimulus onset. Participants performed a multisensory version of Folk et al. (1992) spatial cueing paradigm, while their 129-channel event-related potentials (ERPs) were recorded. Reaction-time spatial cueing served as a behavioural measure of attentional control, while the N2pc ERP component was analysed both canonically and using a multivariate electrical neuroimaging (EN) framework. Behaviourally, target-congruent colour distractors captured attention more strongly when they were simultaneous than semantically congruent (nontarget-congruent colour distractors failed to capture attention), with no behavioural evidence for context modulating multisensory enhancements of capture. However, our EN analyses revealed context-based influences on attention to both visual and multisensory distractors, on how strongly they activated brain networks and in the type of activated brain networks. In both cases, these context-driven brain response modulations occurred early on (long before the traditional N2pc time-window), with network-based modulations at app. 30ms post-distractor, followed by strength-based modulations at app. 100ms post-distractor. Our findings revealed that in naturalistic settings, meaning, next to predictions (spatial, temporal etc.) might be a second important source of contextual information utilised to facilitate goal-directed attention. Therein, attentional selection is controlled by an interaction of one s goals, stimulus perceptual (multisensory-driven) salience and an interaction of stimulus meaning and its predictability. Our study demonstrates how investigating more traditional, lab-studied control mechanisms and processes more typical for everyday life reveals a complex interplay between goal-, stimulus- and context-based processes in attentional control. As such, our findings call for a revision of traditional models of visual attentional control to account for the role of both contextual and multisensory control mechanisms.