Near-surface air temperature (Tair) is critical for addressing urban challenges in China, particularly in the context of rapid urbanization and climate change. While many studies estimate Tair at a national scale, they typically provide only daily data (e.g., maximum and minimum Tair), with few focusing on sub-daily urban Tair at high spatial resolution. In this study, we integrated MODIS-based land surface temperature (LST) data with 18 auxiliary data from 2013 to 2023 to develop a Tair estimation model for major Chinese cities, using random forest algorithms across four diurnal and seasonal conditions: warm daytime, warm nighttime, cold daytime, and cold nighttime. Four model schemes were constructed and compared by combining different auxiliary data (time-related and space-related) with LST. Cross-validation results were found to show that space-related and time-related variables significantly affected the model performance. When all auxiliary data were used, the model performed best, with an average RMSE of 1.6 °C (R2 = 0.96). The best performance was observed on warm nights with an RMSE of 1.47 °C (R2 = 0.97). The importance assessment indicated that LST was the most important variable across all conditions, followed by specific humidity, and convective available potential energy. Space-related variables were more important under cold conditions (or nighttime) compared with warm conditions (or daytime), while time-related variables exhibited the opposite trend and were key to improving model accuracy in summer. Finally, two samples of Tair patterns in Beijing and the Pearl River Delta region were effectively estimated. Our study offered a novel method for estimating sub-daily Tair patterns using open-source data and revealed the impacts of predictive variables on Tair estimation, which has important implications for urban thermal environment research.