“…The approach suffers from several artifacts: it inherently incurs high memory consumption, slow training, and cross-frame inconsistency/flicking. Alternative approaches such as ST-NeRF [Zhang et al 2021b], D-NeRF [Pumarola et al 2020], NeuralBody [Peng et al 2021] and HumanNeRF ] conduct spatial-temporal warping to map individual frames to a common canonical space so that they only need to train a single NeRF. The quality relies heavily on the accuracy of the estimated warping field; when deformation is large or the performer contains too few or too many textures, they tend to produce strong visual artifacts.…”