Background: Gastric cancer (GC), a significant health burden worldwide, is typically diagnosed in the advanced stages due to its non-specific symptoms and complex morphological features. Deep learning (DL) has shown potential for improving and standardizing early GC detection. This systematic review aims to evaluate the current status of DL in pre-malignant, early-stage, and gastric neoplasia analysis. Methods: A comprehensive literature search was conducted in PubMed/MEDLINE for original studies implementing DL algorithms for gastric neoplasia detection using endoscopic images. We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The focus was on studies providing quantitative diagnostic performance measures and those comparing AI performance with human endoscopists. Results: Our review encompasses 42 studies that utilize a variety of DL techniques. The findings demonstrate the utility of DL in GC classification, detection, tumor invasion depth assessment, cancer margin delineation, lesion segmentation, and detection of early-stage and pre-malignant lesions. Notably, DL models frequently matched or outperformed human endoscopists in diagnostic accuracy. However, heterogeneity in DL algorithms, imaging techniques, and study designs precluded a definitive conclusion about the best algorithmic approach. Conclusions: The promise of artificial intelligence in improving and standardizing gastric neoplasia detection, diagnosis, and segmentation is significant. This review is limited by predominantly single-center studies and undisclosed datasets used in AI training, impacting generalizability and demographic representation. Further, retrospective algorithm training may not reflect actual clinical performance, and a lack of model details hinders replication efforts. More research is needed to substantiate these findings, including larger-scale multi-center studies, prospective clinical trials, and comprehensive technical reporting of DL algorithms and datasets, particularly regarding the heterogeneity in DL algorithms and study designs.