The cross-modal interaction between vision and other senses is a key part of how we perceive the real world. Significant stimulation to hearing, sense of smell, taste or touch can reduce the cognitive resources the brain is able to allocate to sight, and thus limit what the Human Visual System (HVS) can actually perceive at that moment. Selective rendering is able to exploit such knowledge of the HVS, to render those parts of a virtual environment a viewer is attending to at a high quality and the rest of the scene at a much lower quality, and thus at a substantially reduced rendering time, without the viewer being aware of this quality difference. This paper investigates how the presence of the modalities of sound, smell and ambient temperature in a virtual environment significantly affects a viewer's ability to perceive the quality of the graphics used for that environment. Experiments were run with a total of 356 participants to determine the graphics quality thresholds across the different cross-modal interactions. The results revealed a significant effect of strong perfume, high temperature and audio noise on perceived rendering quality. Under given conditions, this particular combination of modalities can be thus exploited when rendering virtual environments, to substantially reduce rendering time without any loss in the user's perception of delivered visual quality.