The ability to adaptively follow conspecific eye movements is crucial for establishing shared attention and survival. Indeed, in humans, interacting with the gaze direction of others causes the reflexive orienting of attention and the faster object detection of the signaled spatial location. The behavioral evidence of this phenomenon is called gaze-cueing. Although this effect can be conceived as automatic and reflexive, gaze-cueing is often susceptible to context. In fact, gaze-cueing was shown to interact with other factors that characterize facial stimulus, such as the kind of cue that induces attention orienting (i.e., gaze or non-symbolic cues) or the emotional expression conveyed by the gaze cues. Here, we address neuroimaging evidence, investigating the neural bases of gaze-cueing and the perception of gaze direction and how contextual factors interact with the gaze shift of attention. Evidence from neuroimaging, as well as the fields of non-invasive brain stimulation and neurologic patients, highlights the involvement of the amygdala and the superior temporal lobe (especially the superior temporal sulcus (STS)) in gaze perception. However, in this review, we also emphasized the discrepancies of the attempts to characterize the distinct functional roles of the regions in the processing of gaze. Finally, we conclude by presenting the notion of invariant representation and underline its value as a conceptual framework for the future characterization of the perceptual processing of gaze within the STS.