Emotion recognition from facial visual signals is a challenge which has attracted enormous interest over the past two decades. Researchers are attempting to teach computers to better understand a person's emotional state. Providing emotion recognition can massively enrich experiences. The benefits of this research for human-computer interactions are limitless. Emotions are intricate, and so we need a representative model of the full spectrum displayed by humans. A multi-dimensional emotion representation, which includes valence (how positive an emotion) and arousal (how calming or exciting an emotion), is a good fit. Virtual Reality (VR), a fully immersive computergenerated world, has witnessed significant growth over the past years. It has a wide range of applications including in mental health, such as exposure therapy and the self-attachment technique. In this paper, we address the problem of emotion recognition when the user is immersed in VR. Understanding emotions from facial cues is in itself a demanding task. It is made even harder when a head-mounted VR headset is worn, as now an occlusion blocks the upper half of the face. We attempt to overcome this issue by introducing EmoFAN-VR, a deep neural network architecture, to analyse facial affect in the presence of a severe occlusion from a VR headset with a high level of accuracy. We simulate an occlusion representing a VR headset and apply it to all datasets in this work. EmoFAN-VR predicts both discrete and continuous emotions in one step, meaning it can be used in real-time deployment. We fine-tune our network on the AffectNet dataset under VR occlusion and test it on the AFEW-VA dataset, setting a new baseline for this dataset whilst under VR occlusion.