Efficient processing of ultra-high-resolution images is increasingly sought after with the continuous advancement of photography and sensor technology. However, the semantic segmentation of remote sensing images lacks a satisfactory solution to optimize GPU memory utilization and the feature extraction speed. To tackle this challenge, Chen et al. introduced GLNet, a network designed to strike a better balance between GPU memory usage and segmentation accuracy when processing high-resolution images. Building upon GLNet and PFNet, our proposed method, Fast-GLNet, further enhances the feature fusion and segmentation processes. It incorporates the double feature pyramid aggregation (DFPA) module and IFS module for local and global branches, respectively, resulting in superior feature maps and optimized segmentation speed. Extensive experimentation demonstrates that Fast-GLNet achieves faster semantic segmentation while maintaining segmentation quality. Additionally, it effectively optimizes GPU memory utilization. For example, compared to GLNet, Fast-GLNet’s mIoU on the Deepglobe dataset increased from 71.6% to 72.1%, and GPU memory usage decreased from 1865 MB to 1639 MB. Notably, Fast-GLNet surpasses existing general-purpose methods, offering a superior trade-off between speed and accuracy in semantic segmentation.