The cross-view image translation task is aimed at generating scene images from arbitrary views. However, due to the great differences in the shapes and contents of the various views, the quality of the generated images is degraded. Small objects, such as vehicles' shapes and details, are not clear, which causes them to be structurally inconsistent with the semantic map used to guide the generation process. To solve this problem, we propose a novel generative adversarial network based on a local and global information processing module (LAGGAN) to recover the image's details and structures. The network will further combine the input viewpoint image and the target semantic segmentation map to guide the generation of the target image from another viewpoint. The proposed LAGGAN includes a two-stage generator and a parameter-sharing discriminator. LAGGAN uses a new local and global information processing module (LAG) to generate highquality images from various views. Moreover, we integrate dilated convolutions into the discriminator to capture the global context, which can enhance the discriminative ability and further adjust the LAG module. Therefore, most semantic information can be preserved, and the details of the target viewpoint images can be translated more sharply. Quantitative and qualitative evaluation on both CVUSA and Dayton datasets attest to the fact that our method, LAGGAN, presents satisfactory perceptual results and is comparable to state-of-the-art methods on the cross-view image translation task.INDEX TERMS Aerial images, cross-view image translation, generative adversarial networks (GANs), ground-level images, local and global information processing module.