“…Traditional talking head generation models [49,9,47,42,60,12] focus on synthesizing audio-synchronized lip motion, but only generate lip motion with fixed head poses. To address this issue, some recent works consider personalized attributes [52,55,58,50,48,7]. However, these methods [52,55,58] generate personalized information with a deterministic model and the results are short of diversity, leading to a repetitive pattern.…”