Semantic segmentation is primarily employed to generate accurate prediction labels for each pixel of the input image, and then classify the images according to the generated labels. Semantic segmentation of building and water in remote sensing images helps us to conduct reasonable land planning for a city. However, many current mature networks face challenges in simultaneously attending to both contextual and spatial information when performing semantic segmentation on remote sensing imagery. This often leads to misclassifications and omissions. Therefore, this paper proposes a Dual-Branch Network with Spatial Supplementary Information (SPNet) to address the aforementioned issues. We introduce a Context-aware Spatial Feature-Extractor Unit (CSF) to extract contextual and spatial information, followed by the Feature-Interaction Module (FIM) to supplement contextual semantic information with spatial details. Additionally, incorporating the Goal-Oriented Attention Mechanism helps in handling noise. Finally, to obtain more detailed branches, a Multichannel Deep Feature-Extraction Module (MFM) is introduced to extract features from shallow-level network layers. This branch guides the fusion of low-level semantic information with high-level semantic information. Experiments were conducted on building and water datasets, respectively. The results indicate that the segmentation accuracy of the model proposed in this paper surpasses that of other existing mature models. On the building dataset, the mIoU reaches 87.57, while on the water dataset, the mIoU achieves 96.8, which means that the model introduced in this paper demonstrates strong generalization capabilities.