“…Deep generative models [12,17,29,34,41,45,46,53] have achieved lots of successes in image synthesis tasks. GAN based methods demonstrate amazing capability in yielding high-fidelity samples [4,17,27,44,53]. In contrast, likelihood-based methods, such as Variational Autoencoders (VAEs) [29,45], Diffusion Models [12,24,41] and Autoregressive Models [34,46], offer distribution coverage and hence can generate more diverse samples [41,45,46].…”