Face attribute editing, one of the important research directions in face image synthesis and processing techniques, aims to photorealistic editing single or multiple attributes of face images on demand using editing and generation models. Most existing methods are based on generative adversarial networks, using target attribute vectors to control the editing region or Gaussian noise as conditional input to capture texture details. However, these cannot better control the consistency of attributes in irrelevant regions, while the generation of fidelity is also limited. In this paper, we propose a method that uses an optimized latent space to fuse the attribute feature maps into the latent space. At the same time, make full use of the conditional information for additional constraints. Then, in the image generation phase, we use a progressive architecture for controlled editing of face attributes at different granularities. At last, we also conducted an ablation study on the selected training scheme further to demonstrate the stability and accuracy of our chosen method. The experiments show that our proposed approach, using an end-to-end progressive image translation network architecture, obtained qualitative (FID) as well as quantitative (LPIPS) face image editing results.