Facial expression manipulation has gained wide attention and has been applied in various fields, such as film production, electronic games, and short videos. However, existing facial expression manipulation methods often overlook the details of local regions in images, resulting in the failure to preserve local structures and textures of images. To solve this problem, this paper proposes a local semantic segmentation mask-based GAN (LSGAN) to generate fine-grained facial expression images. LSGAN is composed of a semantic mask generator, an adversarial autoencoder, a transformative generator, and an AU-intensity discriminator. Our semantic mask generator generates eye, mouth, and cheek masks of face images. Then, our transformative generator integrates target expression labels and corresponding facial region features to generate a vivid target facial expression image. In this fashion, we can capture expressions from target face images explicitly. Furthermore, an AU-intensity discriminator is designed to capture facial expression variations and evaluate quality of generated images. Extensive experiments demonstrate that our method achieves authentic face images with accurate facial expressions and outperforms state-of-the-art methods qualitatively and quantitatively.