Accurate wind power forecasting (WPF) is critical in optimizing grid operations and efficiently managing wind energy resources. Challenges arise from the inherent volatility and non-stationarity of wind data, particularly in short-to-medium-term WPF, which extends to longer forecast horizons. To address these challenges, this study introduces a novel model that integrates Improved Empirical Mode Decomposition (IEMD) with an enhanced Transformer called TransIEMD. TransIEMD begins by decomposing the wind speed into Intrinsic Mode Functions (IMFs) using IEMD, transforming the scalar wind speed into a vector form that enriches the input data to reveal hidden temporal dynamics. Each IMF is then processed with channel attention, embedding, and positional encoding to prepare inputs for an enhanced Transformer. The Direct Embedding Module (DEM) provides an alternative viewpoint on the input data. The distinctive perspectives of IEMD and DEM offer interaction through cross-attention within the encoder, significantly enhancing the ability to capture dynamic wind patterns. By combining cross-attention and self-attention within the encoder–decoder structure, TransIEMD demonstrates enhanced proficiency in detecting and leveraging long-range dependencies and dynamic wind patterns, improving the forecasting precision. Extensive evaluations on a publicly available dataset from the National Renewable Energy Laboratory (NREL) demonstrate that TransIEMD significantly improves the forecasting accuracy across multiple horizons of 4, 8, 16, and 24 h. Specifically, at the 24 h forecast horizon, TransIEMD achieves reductions in the normalized mean absolute error and root mean square error of 4.24% and 4.37%, respectively, compared to the traditional Transformer. These results confirm the efficacy of integrating IEMD with attention mechanisms to enhance the accuracy of WPF.