Accurate information on the location, shape, and size of photovoltaic (PV) arrays is essential for optimal power system planning and energy system development. In this study, we explore the potential of deep convolutional neural networks (DCNNs) for extracting PV arrays from high spatial resolution remote sensing (HSRRS) images. While previous research has mainly focused on the application of DCNNs, little attention has been paid to investigating the influence of different DCNN structures on the accuracy of PV array extraction. To address this gap, we compare the performance of seven popular DCNNs—AlexNet, VGG16, ResNet50, ResNeXt50, Xception, DenseNet121, and EfficientNetB6—based on a PV array dataset containing 2072 images of 1024 × 1024 size. We evaluate their intersection over union (IoU) values and highlight four DCNNs (EfficientNetB6, Xception, ResNeXt50, and VGG16) that consistently achieve IoU values above 94%. Furthermore, through analyzing the difference in the structure and features of these four DCNNs, we identify structural factors that contribute to the extraction of low-level spatial features (LFs) and high-level semantic features (HFs) of PV arrays. We find that the first feature extraction block without downsampling enhances the LFs’ extraction capability of the DCNNs, resulting in an increase in IoU values of approximately 0.25%. In addition, the use of separable convolution and attention mechanisms plays a crucial role in improving the HFs’ extraction, resulting in a 0.7% and 0.4% increase in IoU values, respectively. Overall, our study provides valuable insights into the impact of DCNN structures on the extraction of PV arrays from HSRRS images. These findings have significant implications for the selection of appropriate DCNNs and the design of robust DCNNs tailored for the accurate and efficient extraction of PV arrays.