In this paper, we tackle the challenging problem of rendering real‐world 360° panorama videos that support full 6 degrees‐of‐freedom (DoF) head motion from a prerecorded omnidirectional stereo (ODS) video. In contrast to recent approaches that create novel views for individual panorama frames, we introduce a video‐specific temporally‐consistent multi‐sphere image (MSI) scene representation. Given a conventional ODS video, we first extract information by estimating framewise descriptive feature maps. Then, we optimize the global MSI model using theory from recent research on neural radiance fields. Instead of a continuous scene function, this multi‐sphere image (MSI) representation depicts colour and density information only for a discrete set of concentric spheres. To further improve the temporal consistency of our results, we apply an ancillary refinement step which optimizes the temporal coherency between successive video frames. Direct comparisons to recent baseline approaches show that our global MSI optimization yields superior performance in terms of visual quality. Our code and data will be made publicly available.