Physical urban boundaries (PUBs) are basic geographic information data for defining the spatial extent of urban landscapes with non-agricultural land and non-agricultural economic activities. Accurately mapping PUBs provides a spatiotemporal database for urban dynamic monitoring, territorial spatial planning, and ecological environment protection. However, traditional extraction methods often have problems, such as subjective parameter settings and inconsistent cartographic scales, making it difficult to identify PUBs objectively and accurately. To address these problems, we proposed a self-supervised learning approach for PUB extraction. First, we used nighttime light and OpenStreetMap road data to map the initial urban boundary for data preparation. Then, we designed a pretext task of self-supervised learning based on an unsupervised mutation detection algorithm to automatically mine supervised information in unlabeled data, which can avoid subjective human interference. Finally, a downstream task was designed as a supervised learning task in Google Earth Engine to classify urban and non-urban areas using impervious surface density and nighttime light data, which can solve the scale inconsistency problem. Based on the proposed method, we produced a 30 m resolution China PUB dataset containing six years (i.e., 1995, 2000, 2005, 2010, 2015, and 2020). Our PUBs show good agreement with existing products and accurately describe the spatial extent of urban areas, effectively distinguishing urban and non-urban areas. Moreover, we found that the gap between the national per capita GDP and the urban per capita GDP is gradually decreasing, but regional coordinated development and intensive development still need to be strengthened.