As virtual reality display technologies advance, resolutions and refresh
rates continue to approach human perceptual limits, presenting a
challenge for real-time rendering algorithms. Neural super-resolution is
promising in reducing the computation cost and boosting the visual
experience by scaling up low-resolution renderings. However, the added
workload of running neural networks cannot be neglected. In this paper,
we try to alleviate the burden by exploiting the foveated nature of the
human visual system, where acuity decreases rapidly from the focal point
to the periphery. With the help of dynamic and geometric information
(i.e.,pixel-wise motion vectors, depth, and camera transformation)
available inherently in the real-time rendering content, we propose a
neural accumulator to effectively aggregate the amortizedly rendered
low-resolution visual information from frame to frame recurrently. By
leveraging a partition-assemble scheme, we use a neural super-resolution
module to upsample the low-resolution image tiles to different qualities
according to their perceptual importance and reconstruct the final
output heterogeneously. Perceptually high-fidelity foveated
high-resolution frames are generated in real-time, surpassing the
quality of other foveated super-resolution methods.