When engaging in physical contact, our emotional response hinges not only on the nuanced sensory details and the receptive properties of the skin but also on contextual cues related to the situation and interpersonal dynamics. The consensus is that the nature of the affective interactive experience in social touch is shaped by a combination of ascending, C-tactile (CT) afferents mediated somatosensory information, and modulatory, top-down information. The question we pose here is whether, in the absence of somatosensory input, multisensory cues alone can suffice to create a genuinely pleasant, authentic, and engaging experience in virtual reality. The study aims to explore how affective touch is perceived in immersive virtual environments, considering varied social norms in neutral settings or settings like a physiotherapy room where the touch provider is a healthcare professional. We conducted an experiment with 58 male and female healthy adults, where we employed a within-group counterbalanced design featuring two factors: (a) visuo-tactile affective touch, and (b) visual-only affective touch. Findings, drawn from questionnaires and collected physiological data, shed light on how contextual factors influence implicit engagement, self-reported embodiment, co-presence, as well as the perceived realism and pleasantness of the touch experience. Our findings, in line with the literature, indicate that to experience the advantages of touch in immersive virtual worlds, it is essential to incorporate haptic feedback, as depending solely on visual input may not be adequate for fully realizing the optimal benefits of interpersonal touch. Furthermore, in contradiction with our hypothesis, a less ambiguous context (specifically, the physiotherapy room and touch from a physiotherapist) is not linked to heightened touch pleasantness.