Accurate wind power forecasting is essential for both optimal grid scheduling and the massive absorption of wind power into the grid. However, the continuous changes in the contribution of various meteorological features to the forecasting of wind power output under different time or weather conditions, and the overlapping of wind power sequence cycles, make forecasting challenging. To address these problems, a short-term wind power forecasting model is established that integrates a gated recurrent unit (GRU) network with a dual attention mechanism (DAM). To compute the contributions of different features in real time, historical wind power data and meteorological information are first extracted using a feature attention mechanism (FAM). The feature sequences collected by the FAM are then used by the GRU network for preliminary forecasting. Subsequently, one-dimensional convolution employing several distinct convolution kernels is used to filter the GRU outputs. In addition, a multi-head time attention mechanism (MHTAM) is proposed and a Gaussian bias is introduced to assign different weights to different time steps of each modality. The final forecast results are produced by combining the outputs of the MHTAM. The results of the simulation experiment show that for 5-h, 10-h, and 20-h short-term wind power forecasting, the established DAM-GRU model performs better than comparative models on the basis of Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R-squared (R2), Square sum error (SSE), Mean absolute percentile error (MAPE), and Relative root mean square error (RRMSE) index.