Economic development in China has been severely restricted by environmental problems such as carbon emissions. Improving green total factor productivity (GTFP) is an extremely important pathway to realizing carbon peak and carbon neutrality. Nevertheless, existing studies on China’s urban GTFP under the carbon emissions constraint are still insufficient. In this context, this study adopts the directional distance function (DDF), includes carbon emissions in the undesirable output, combines the global Malmquist–Luenberger (GML) productivity index, and calculates the GTFP of China’s cities. On this basis, the Dagum Gini coefficient, kernel density estimation, and convergence model are employed to explore the regional differences, distribution dynamics, and convergence in China and in three subdivision regions of east, center, and west. The core conclusions are as follows: (1) the average annual growth rate of GTFP in China’s cities is about 0.7064%, which is relatively low, but there is great room for improvement. The growth trend of GTFP in the three subdivision regions of east, center and west is obvious, presenting a spatial distribution characteristic of “high in the east and low in the west”; (2) the regional differences in GTFP of these cities are enlarging, with the largest gap in the eastern region and the smallest in the western region. Intraregional difference is the primary source of regional differences; (3) the imbalance in urban GTFP in China is prominent, with noticeable gradient differences, making it difficult to achieve hierarchical crossing. The central and western regions even have multilevel differentiation problems; (4) there is an absolute β convergence and conditional β convergence of China’s GTFP, but no σ convergence. As a result, it is necessary to comprehensively consider and actively implement the concept of shared development, enhance technological progress, focus on narrowing the differences in GTFP, and facilitate coordinated green development within the regions.