Summary
Power consumption of current High Performance Computing systems has to be reduced by at least one order of magnitude before they can be scaled up towards ExaFLOP performance. While we can expect novel hardware technologies and architectures to contribute towards this goal, significant advances have to come also from software technologies such as proactive and power‐aware scheduling, resource allocation, and fault‐tolerant computing. Development of these software technologies in turn relies heavily on our ability to model and accurately predict power consumption in large computing systems. In this paper, we present a data‐driven model of power consumption for a hybrid supercomputer (which held the top spot in the Green500 ranking in June 2013) that combines CPU, GPU, and MIC technologies to achieve high levels of energy efficiency. Our model takes as input workload characteristics—the number and location of resources that are used by each job at a certain time—and calculates a predicted power consumption at the system level. The model is application‐code‐agnostic and is based solely on a data‐driven predictive approach, where log data describing the past jobs in the system are employed to estimate future power consumption. For this, three different model components are developed and integrated. The first employs support vector regression to predict power usage for jobs before these are started. The second uses a simple heuristic to predict the length of jobs, again before they start. The two predictions are then combined to estimate power consumption due to the job at all computational elements in the system. The third component is a linear model that takes as input the power consumption at the computing units and predicts system‐wide power consumption. Our method achieves highly‐accurate predictions starting solely from workload information and user histories. The model can be applied to power‐aware scheduling and power capping: alternative workload dispatching configurations can be evaluated from a power perspective and more efficient ones can be selected. The methodology outlined here can be easily adapted to other HPC systems where the same types of log data are available.