Urban ecosystem dysfunction, habitat fragmentation, and biodiversity loss caused by rapid urbanization have threatened sustainable urban development. Urban habitat quality is one of the important indicators for assessing the urban ecological environment. Therefore, it is of great practical significance to carry out a study on the driving mechanism of urban habitat quality and integrate the results into urban planning. In this study, taking Zhengzhou, China, as an example, the InVEST model was used to analyze the spatial differentiation characteristics of urban habitat quality and Geodetector software was adopted to explore the driving mechanism of habitat quality at different grid-scales. The results show the following: (1) LUCC, altitude, slope, surface roughness, relief amplitude, population, nighttime light, and NDVI are the dominant factors affecting the spatial differentiation of habitat quality. Among them, the impacts of slope, surface roughness, population, nighttime light, and NDVI on habitat quality are highly sensitive to varying grid-scales. At the grid-scale of 1000 to 1250 m, the impacts of the dominant factors on habitat quality is closer to the mean level of multiple scales. (2) The impact of each factor on the spatial distribution of habitat quality is different, and the difference between most factors has always been significant regardless of the variation of grid-scales. The superimposed impact of two factors on the spatial distribution of habitat quality is greater than the impact of the single factor. (3) Combined with the research results and the local conditions of Zhengzhou, we put forward some directions of habitat protection around adjusting urban land use structure, applying nature-based solutions and establishing a systematic thinking model for multi-level urban habitat sustainability.