To efficiently manage the cloud resources, improve the quality of service, and avoid the violations of Service-Level Agreement (SLA) agreements, it is very important to make accurate forecast for cloud workload. Prior works concerning cloud workload forecasting are mainly designed based on Recurrent Neural Networks (RNN). However, when it comes to a highly-dynamic cloud workload scenario where resource utilization changes faster and more frequently, these RNN-based methods are not effective in obtaining the linear and non-linear relationships and cannot give accurate forecast, because classic RNN has the problem of vanishing gradient. To address this issue, we propose an Ensemble Forecasting Approach (EFA) for highly-dynamic cloud workload by applying Variational Mode Decomposition (VMD) and R-Transformer. Specifically, to decrease the non-stationarity and high randomness of highly-dynamic cloud workload sequences, we decompose the workload into multiple Intrinsic Mode Functions (IMFs) by VMD. The IMFs are then imported into our ensemble forecasting module based on R-Transformer and Autoregressive model, in order to capture long-term dependencies and local non-linear relationship of workload sequences. The effectiveness and adaptability of proposed EFA is verified on real-world workload from Google and Alibaba cluster traces. Moreover, the performance evaluation results show that the EFA performs higher forecasting accuracy than prior related works over various forecasting time lengths for highly-dynamic cloud workload. INDEX TERMS Workload forecasting, Cloud computing, Deep learning, Variational mode decomposition.