Automatic extraction of building footprints from high-resolution satellite imagery has become an important and challenging research issue receiving greater attention. Many recent studies have explored different deep learning-based semantic segmentation methods for improving the accuracy of building extraction. Although they record substantial land cover and land use information (e.g., buildings, roads, water, etc.), public geographic information system (GIS) map datasets have rarely been utilized to improve building extraction results in existing studies. In this research, we propose a U-Net-based semantic segmentation method for the extraction of building footprints from high-resolution multispectral satellite images using the SpaceNet building dataset provided in the DeepGlobe Satellite Challenge of IEEE Conference on Computer Vision and Pattern Recognition 2018 (CVPR 2018). We explore the potential of multiple public GIS map datasets (OpenStreetMap, Google Maps, and MapWorld) through integration with the WorldView-3 satellite datasets in four cities (Las Vegas, Paris, Shanghai, and Khartoum). Several strategies are designed and combined with the U-Net–based semantic segmentation model, including data augmentation, post-processing, and integration of the GIS map data and satellite images. The proposed method achieves a total F1-score of 0.704, which is an improvement of 1.1% to 12.5% compared with the top three solutions in the SpaceNet Building Detection Competition and 3.0% to 9.2% compared with the standard U-Net–based method. Moreover, the effect of each proposed strategy and the possible reasons for the building footprint extraction results are analyzed substantially considering the actual situation of the four cities.