The present study used an inhibition of return (IOR) spatial cueing paradigm to examine how gaze direction and head orientation modulate attention capture for human faces. Target response time (RT) was measured after the presentation of a peripheral cue, which was either a face (with front-facing or averted gaze, in either frontal head view or averted head view) or a house (control). Participants fixated on a centered cross at all times and responded via button press to a peripheral target after a variable stimulus onset asynchrony (SOA) from the stimulus cue. At the shortest SOA (150 ms), RTs were shorter for faces than houses, independent of an IOR response, suggesting a cue-based RT advantage elicited by faces. At the longest SOA (2,400 ms), a larger IOR magnitude was found for faces compared to houses. Both the cue-based RT advantage and later IOR responses were modulated by gaze-head congruency; these effects were strongest for frontal gaze faces in frontal head view, and for averted gaze faces in averted head view. Importantly, participants were not given any specific information regarding the stimuli, nor were they told the true purpose of the study. These findings indicate that the congruent combination of head and gaze direction influence the exogenous attention capture of faces during inhibition of return.