With the rapid development of artificial intelligence, the generation of emotional expression images has become a key research field. This article introduces a novel multi-stage emotion generation model cascade method that utilizes CGAN, Pix2Pix, and CycleGAN to create images with enhanced emotional depth and visual quality. We have outlined our approach, which involves a continuous process from emotional initialization to texture refinement, and then to style transition. Our experiments on facial and automotive datasets show a significant improvement in image quality compared to traditional models, with an average increase of 40 percentage points in structural similarity (SSIM) and an average increase of 11.1 percentage points in peak signal-to-noise ratio (PSNR). The research findings emphasize the potential applications of our model in advertising, entertainment, and human-computer interaction, where visual effects of emotional resonance are crucial.