Alzheimer’s disease (AD) is the most common form of dementia. Currently, only symptomatic management is available, and early diagnosis and intervention are crucial for AD treatment. As a recent deep learning strategy, generative adversarial networks (GANs) are expected to benefit AD diagnosis, but their performance remains to be verified. This study provided a systematic review on the application of the GAN-based deep learning method in the diagnosis of AD and conducted a meta-analysis to evaluate its diagnostic performance. A search of the following electronic databases was performed by two researchers independently in August 2021: MEDLINE (PubMed), Cochrane Library, EMBASE, and Web of Science. The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was applied to assess the quality of the included studies. The accuracy of the model applied in the diagnosis of AD was determined by calculating odds ratios (ORs) with 95% confidence intervals (CIs). A bivariate random-effects model was used to calculate the pooled sensitivity and specificity with their 95% CIs. Fourteen studies were included, 11 of which were included in the meta-analysis. The overall quality of the included studies was high according to the QUADAS-2 assessment. For the AD vs. cognitively normal (CN) classification, the GAN-based deep learning method exhibited better performance than the non-GAN method, with significantly higher accuracy (OR 1.425, 95% CI: 1.150–1.766, P = 0.001), pooled sensitivity (0.88 vs. 0.83), pooled specificity (0.93 vs. 0.89), and area under the curve (AUC) of the summary receiver operating characteristic curve (SROC) (0.96 vs. 0.93). For the progressing MCI (pMCI) vs. stable MCI (sMCI) classification, the GAN method exhibited no significant increase in the accuracy (OR 1.149, 95% CI: 0.878–1.505, P = 0.310) or the pooled sensitivity (0.66 vs. 0.66). The pooled specificity and AUC of the SROC in the GAN group were slightly higher than those in the non-GAN group (0.81 vs. 0.78 and 0.81 vs. 0.80, respectively). The present results suggested that the GAN-based deep learning method performed well in the task of AD vs. CN classification. However, the diagnostic performance of GAN in the task of pMCI vs. sMCI classification needs to be improved.Systematic Review Registration: [PROSPERO], Identifier: [CRD42021275294].