Positron emission tomography (PET) has been substantially used recently. To minimize the potential health risk caused by the tracer radiation inherent to PET scans, it is of great interest to synthesize the high-quality PET image from the low-dose one to reduce the radiation exposure. In this paper, we propose a 3D auto-context-based locality adaptive multi-modality generative adversarial networks model (LA-GANs) to synthesize the high-quality FDG PET image from the low-dose one with the accompanying MRI images that provide anatomical information. Our work has four contributions. First, different from the traditional methods that treat each image modality as an input channel and apply the same kernel to convolve the whole image, we argue that the contributions of different modalities could vary at different image locations, and therefore a unified kernel for a whole image is not optimal. To address this issue, we propose a locality adaptive strategy for multi-modality fusion. Second, we utilize 1×1×1 kernel to learn this locality adaptive fusion so that the number of additional parameters incurred by our method is kept minimum. Third, the proposed locality adaptive fusion mechanism is learned jointly with the PET image synthesis in a 3D conditional GANs model, which generates high-quality PET images by employing large-sized image patches and hierarchical features. Fourth, we apply the auto-context strategy to our scheme and propose an auto-context LA-GANs model to further refine the quality of synthesized images. Experimental results show that our method outperforms the traditional multi-modality fusion methods used in deep networks, as well as the state-of-the-art PET estimation approaches.