Surface water is a fundamental resource in urban environments. Monitoring the spatio-temporal distribution of urban surface water from remotely sensed images is crucial for urban planning and management. Unfortunately, due to the limitation of spatial resolution, the method based on low/medium resolution images is difficult to extract small water bodies accurately. Recently, very high resolution (VHR) images have shown considerable potential for urban compositions mapping. However, fewer spectral bands, shadows, and high spectral heterogeneity of VHR images hinder the application of traditional methods. In this study, we proposed an urban surface water mapping method called sparse superpixel-based water extraction (SSWE) from VHR images. The method includes three steps: (1) clustering water bodies into sparse targets at the object level by an improved scale-adaptive simple non-iterative clustering (SA-SNIC) superpixel segmentation; (2) generating new bands with additional spectral, spatial, and derived features, to increase the dimensions of original data and enhance the separability between water bodies and background covers; (3) constructing a positivenegative constrained energy minimization (PNCEM) multi-target sparse detector to highlight the water bodies while suppressing shadows. The proposed method was applied to GF-2 multispectral images of four cities in China. The results showed that SSWE achieved the highest accuracy compared with other methods, with an average OA of 98.91% and an average kappa coefficient of 0.942. Furthermore, the separability analysis also indicated that SSWE could effectively distinguish urban water bodies from shadows and other land covers. And stable results can be acquired by the suggested parameters and thresholds of SSWE.