Deep learning has brought unprecedented progress to image inpainting. However, the existing methods often generate images with blurry textures and distorted structures because they may either fail to maintain semantic consistency or restore fine-grained textures. In this paper, we propose a two-stage adversarial model to further improve the accuracy of the structure and details of image inpainting. Our model splits the inpainting task into two parts: semantic structure reconstructor and texture generator. In the first stage, we first utilize the semantic structure map based on the unsupervised segmentation to train the semantic structure reconstructor, which completes the missing structures of the inputs and maintains consistency between the missing part and the overall image. In the second stage, we introduce the spatial-channel attention (SCA) module to obtain the fine-grained textures. The SCA module strengthens the capability to obtain information from the long-distance pixel and different channels of the model. Furthermore, we propose a spatial-channel loss to stabilize the network training process and improve visual effects. Finally, we evaluate our model over the publicly available datasets CelebA, Places2, and Paris StreetView. When the inpainting tasks involved in large-area defects or heavy structure, the experimental results show that our method has a higher inpainting quality than the existing state-of-the-art approaches. INDEX TERMS Artificial neural networks, deep learning, generative model, image generation, image inpainting.