Fellow, IEEE, Qi Sun (a) Edge view (b) Cloud data (c) Uniform streaming (d) Quality for (c) (e) Fixation-based streaming (f) Quality for (e) (g) Saccade-based streaming (h) Quality for (g) Fig. 1. Our gaze-contingent immersive 3D assets streaming interface. Starting from the partially-streamed 3D assets on the edge-side rendered at a given time (a), our method streams additional updates from the cloud to the edge for perceptually closer rendering to the full assets stored on the cloud (b). Standard uniform streaming (c) evenly updates all visible assets in the scene, causing suboptimal perceptual quality in (d), which visualizes both the temporal (popping with respect to (a)) and the spatial (quality with respect to (b)) perceptual errors. Brighter colors indicate worse artifacts. Our method, in contrast, optimizes the subset of the assets to be streamed to the edge for better spatio-temporal quality under the same network bandwidth. Our perceptual model considers both eccentricity-based acuity during fixation (e) and temporal masking during eye movement (g). If the user fixes the gaze (e), our model prioritizes regions near the gaze point (green circle) while reducing potential popping artifacts. If a saccade is detected (g), our model can safely ignore popping to stream more aggressive updates for further quality improvement (h).