Transformer together with convolutional neural network (CNN) has achieved better performance than the pure module-based methods. However, the advantages of both coding styles are not well considered, and the designed fusion modules have not achieved good effect in the aspect of remote sensing image (RSI) semantic segmentation. In this paper, to exploit local and global pixel dependencies, improved Gated Recurrent Units combined with fusion, are proposed to harness the complementary advantages of Parallel Hybrid Network for semantic segmentation of RSIs. The proposed network core is feature selection and fusion module (FSFM), which is composed by both feature selection units (FSU) and feature fusion units (FFU), named FSFM-PHN. Concretely, to precisely incorporate local and global representations, the improved reset and update gates of ConvGRU are treated as FSU and is realized the feature selection of the advantageous segmentation task. To merge the outputs from ResNet, Swin Transformer and FSU, feature fusion units (FFU) based on stack and sequential convolutional block operations is constructed. On the public Vaihingen, Potsdam and BLU datasets, the experimental results show that FSFM is effective, which outperforms state-of-the-art methods in some famous remote image semantic segmentation tasks.