In recent years, large-scale computing systems have been widely used as an important part of the computing infrastructure. Resource management based on systems workload prediction is an effective way to improve application efficiency. However, accuracy and real-time functionalities are always the key challenges that perplex the systems workload prediction model. In this paper, we first investigate the dependence on historical workload in large-scale computing systems and build a day and time two-dimensional time-series workload model. We then design a two-dimensional long short-term memory (LSTM) neural network cell structure. Based on this, we propose an improved LSTM prediction model providing its mathematical description and an error back propagation method. Furthermore, to achieve systems resource management real-time requirement, we provide a parallel improved LSTM algorithm that uses a hidden layer week-based dependence and weights parallelization algorithm. The comparative studies, based on the actual workload of the Shanghai Supercomputer Center, demonstrate that our proposed improved LSTM neural network prediction model can achieve higher accuracy and real-time performance in large-scale computing systems. Workload prediction, computing systems, LSTM, neural network, parallel.
INDEX TERMS