Numerical weather prediction (NWP) provides the future state of the atmosphere and is a major tool for weather forecasting. However, NWP has inevitable errors and requires bias correction to obtain more accurate forecasts. NWP is based on discrete numerical calculations, which inevitably result in a loss in resolution, and downscaling provides important support for obtaining detailed weather forecasts. In this paper, based on the spatio-temporal modeling approach, the Spatio-Temporal Transformer U-Net (ST-UNet) is constructed based on the U-net framework using the swin transformer and convolution to perform bias correction and temporal downscaling. The encoder part extracts features from the multi-time forecasts, and the decoder part uses the features from the encoder part and the constructed query vector for feature reconstruction. Besides, the query builder block generates different query vectors to accomplish different tasks. Multi-time bias correction was conducted for the 2-m temperature and the 10-m wind component. The results showed that the deep learning model significantly outperformed the anomaly numerical correction with observations, and ST-UNet also outperformed the U-Net model for single-time bias correction and the 3-dimensional U-Net (3D-UNet) model for multi-time bias correction. Forecasts from ST-UNet obtained the smallest root mean square error and the largest accuracy and correlation coefficient on both the 2-m temperature and 10-m wind component experiments. Meanwhile, temporal downscaling was performed to obtain hourly forecasts based on ST-UNet, which increased the temporal resolution and reduced the root mean square error by 0.78 compared to the original forecasts. Therefore, our proposed model can be applied to both bias correction and temporal downscaling tasks and achieve good accuracy.