The early diagnoses of esophageal cancer are of great significance in the clinic because they are critical for reducing mortality. At present, the diagnoses are mainly performed by artificial detection and annotations based on gastroscopic images. However, these procedures are very challenging to clinicians due to the large variability in the appearance of early cancer lesions. To reduce the subjectivity and fatigue in manual annotations and to improve the efficiency of diagnoses, computer-aided annotation methods are highly required. In this work, we proposed a novel method that utilized deep learning (DL) techniques to realize the automatic annotation of early esophageal cancer (EEC) lesions in gastroscopic images. The depth map of gastroscopic images was initially extracted by a DL network. Then, this additional depth information was fused with the original RGB gastroscopic images, which were then sent to another DL network to obtain precise annotations of EEC regions. In total, 4231 gastroscopic images of 732 patients were used to build and validate the proposed method. A total of 3190 of those images were EEC images, and the remaining 1041 were non-EEC images. The experimental results show that the combination of depth information and RGB information improved the annotation performance. The final EEC detection rate and mean Dice Similarity Coefficient (DSC) of our method were 97.54% and 74.43%, respectively. Compared with other state-of-the-art DL-based methods, the proposed method showed better annotation performances and fewer false positive outputs. Therefore, our method offers a good prospect in aiding the clinical diagnoses of EEC.INDEX TERMS Gastroscopic image, early esophageal cancer, lesion annotation, deep learning, depth map.