“…However, dedicated device setup and heavily manual work are always needed for generating a realistic avatar and reconstructing the detailed appearance, subtle expressions, and gaze movement of a subject. Recent deep-learning based methods [6,12,30,32,58,78,79,82,85,86,88] avoid 3D avatar modeling and directly synthesize a talking head video of a subject from one source image of the subject and a video sequence. Elgharib et al [18] developed a solution for warping the video of a subject's face from side view to front view.…”