A type of aquaculture pond called a dike-pond system is distributed in the low-lying river delta of China’s eastern coast. Along with the swift growth of the coastal economy, the water surfaces of the dike-pond system (WDPS) play a major role attributed to pond aquaculture yielding more profits than dike agriculture. This study aims to explore the performance of deep learning methods for extracting WDPS from high spatial resolution remote sensing images. We developed three fully convolutional network (FCN) models: SegNet, UNet, and UNet++, which are compared with two traditional methods in the same testing regions from the Guangdong–Hong Kong–Macao Greater Bay Area. The extraction results of the five methods are evaluated in three parts. The first part is a general comparison that shows the biggest advantage of the FCN models over the traditional methods is the P-score, with an average lead of 13%, but the R-score is not ideal. Our analysis reveals that the low R-score problem is due to the omission of the outer ring of WDPS rather than the omission of the quantity of WDPS. We also analyzed the reasons behind it and provided potential solutions. The second part is extraction error, which demonstrates the extraction results of the FCN models have few connected, jagged, or perforated WDPS, which is beneficial for assessing fishery production, pattern changes, ecological value, and other applications of WDPS. The extracted WDPS by the FCN models are visually close to the ground truth, which is one of the most significant improvements over the traditional methods. The third part is special scenarios, including various shape types, intricate spatial configurations, and multiple pond conditions. WDPS with irregular shapes or juxtaposed with other land types increases the difficulty of extraction, but the FCN models still achieve P-scores above 0.95 in the first two scenarios, while WDPS in multiple pond conditions causes a sharp drop in the indicators of all the methods, which requires further improvement to solve it. We integrated the performances of the methods to provide recommendations for their use. This study offers valuable insights for enhancing deep learning methods and leveraging extraction results in practical applications.