Automatic segmentation is the crucial step for esophageal optical coherence tomography (OCT) image processing, which is able to highlight diagnosis-related tissue layers and provide characteristics such as shape and thickness for esophageal disease diagnosis. This study proposes a dual-stage framework using a specifically designed encoder-decoder network configuration for accurate and reliable esophageal layer segmentation, which is named as the dual-stage U-shape convolutional network (D-UCN). The proposed approach utilized one UCN to locate the target tissue region, which is followed by another UCN with similar architecture to achieve the final segmentation. In this way, the proposed strategy effectively solves the problems encountered in our previous studies, such as disturbance from neighboring diagnostically unrelated tissues, probe protection sheaths from the imaging equipment and the inevitable speckle noise. Experimental results on esophageal OCT B-scans from C57BL mice demonstrated that the proposed dual-stage framework achieved performance comparable to manual segmentation. The effectiveness and advantages of the dual-stage strategy are also confirmed in comparison with graph theory dynamic program (GTDP) and U-Net. INDEX TERMS Esophageal layer segmentation, Optical coherence tomography, Fully convolutional network, Medical image analysis. FIGURE 1. Demonstration of (a) a typical esophageal OCT image for mouse and (b) the corresponding manual segmentation result. B. FRAMEWORK AND ARCHITECTURE This study proposes a D-UCN framework to segment esophageal layers from OCT images in a dual-stage strategy. It consists of two cascaded parts with the same U-shape network architecture, UCN-I and UCN-II, as shown in Fig.