High-resolution spatiotemporal wind speed mapping is useful for atmospheric environmental monitoring, air quality evaluation and wind power siting. Although modern reanalysis techniques can obtain reliable interpolated surfaces of meteorology at a high temporal resolution, their spatial resolutions are coarse. Local variability of wind speed is difficult to capture due to its volatility. Here, a two-stage approach was developed for robust spatiotemporal estimations of wind speed at a high resolution. The proposed approach consists of geographically weighted ensemble machine learning (Stage 1) and downscaling based on meteorological reanalysis data (Stage 2). The geographically weighted machine learning method is based on three base learners, which are an autoencoder-based deep residual network, XGBoost and random forest, and it incorporates spatial autocorrelation and heterogeneity to boost the ensemble predictions. With reanalysis data, downscaling was introduced in Stage 2 to reduce bias and spatial abrupt (non-natural) variation in the predictions inferred from Stage 1. The autoencoder-based residual network was used in Stage 2 to adjust the difference between the averages of the fine-resolution predicted values and the coarse-resolution reanalysis data to ensure consistency. Using mainland China as a case study, the geographically weighted regression (GWR) ensemble predictions were shown to perform better than individual learners’ predictions (with an approximately 12–16% improvement in R2 and a decrease of 0.14–0.19 m/s in root mean square error). Downscaling further improved the predictions by reducing inconsistency and obtaining better spatial variation (smoothing). The proposed approach can also be applied for the high-resolution spatiotemporal estimation of other meteorological parameters or surface variables involving remote sensing images (i.e. reliable coarsely resolved data), ground monitoring data and other relevant factors.