As virtual reality display technologies advance, resolutions and refresh rates continue to approach human perceptual limits, presenting a challenge for real‐time rendering algorithms. Neural super‐resolution is promising in reducing the computation cost and boosting the visual experience by scaling up low‐resolution renderings. However, the added workload of running neural networks cannot be neglected. In this article, we try to alleviate the burden by exploiting the foveated nature of the human visual system, in a way that we upscale the coarse input in a heterogeneous manner instead of uniform super‐resolution according to the visual acuity decreasing rapidly from the focal point to the periphery. With the help of dynamic and geometric information (i.e., pixel‐wise motion vectors, depth, and camera transformation) available inherently in the real‐time rendering content, we propose a neural accumulator to effectively aggregate the amortizedly rendered low‐resolution visual information from frame to frame recurrently. By leveraging a partition‐assemble scheme, we use a neural super‐resolution module to upsample the low‐resolution image tiles to different qualities according to their perceptual importance and reconstruct the final output adaptively. Perceptually high‐fidelity foveated high‐resolution frames are generated in real‐time, surpassing the quality of other foveated super‐resolution methods.