To comprehend the intricate interaction between water resources and food security, it is critical to examine the hidden water footprint (WF) of food consumption and its underlying causes within specific nations or areas. This study investigates the changes in the quality and structure of food consumption in China’s urban and rural areas from 2000 to 2020. Following the calculation of the WF associated with food consumption for both urban and rural populations, this study uses ArcGIS 10.6 software to map the spatial configuration of the provincial per capita WF. Moreover, the random forest model is utilized to uncover the salient determinants influencing the WF of food consumption in urban and rural contexts. Quantitatively, rural populations have witnessed a more pronounced acceleration in their per capita food WF compared with urban entities, with a notable upswing in the proportion of meat and poultry consumption. Spatially, regions exhibiting elevated WF for urban populations have transitioned from the western zones toward the southeast and northeast, whereas a marked east–west dichotomy is evident in rural areas. In terms of drivers, for urban demographics, economic variables emerge as paramount determinants for food WF, while rural areas underscore the prominence of natural and technological parameters. The insights garnered from this investigation bear profound implications for facilitating balanced nutritional intake among China’s urban and rural populations, alleviating food-related water resource pressures, and optimizing water resource utilization.