Score-P is a measurement infrastructure originally designed for the analysis and optimization of the performance of HPC codes. Recent extensions of Score-P and its associated tools now also allow the investigation of energy-related properties and support the user in the implementation of corresponding improvements. Since it would be counterproductive to completely ignore performance issues in this connection, the focus should not be laid exclusively on energy. We therefore aim to optimize software with respect to an objective function that takes into account energy and run time.1 The basic problem and static tuning approaches for its solutionTo satisfy the demands from the scientific computing community, the established high performance computing centers provide a large amount of massively parallel computing hardware. One of the main challenges that the HPC centers have to face today is the cost of the energy required to operate this hardware which already amounts to about 30% of the total cost of ownership of a current HPC system, with a rising tendency [1]. Thus HPC centers will likely force their users to optimize their software with respect to its energy requirements. In this paper, we provide a brief survey of recent developments in this area. An early strategy implemented by certain HPC centers [2] was to set the default CPU clock frequency of their systems to a value much lower than the highest possible frequency. This concept is based on the fact that the total power required by a compute job can be additively decomposed into a static * Corresponding author: e-mail diethelm@gns-mbh.com component P st = const (known as idle power ) and a dynamic part that depends on f and on the voltage U as P dyn ∼ U 2 · f where, to obtain stability of the operation, U needs to be raised when f is increased. The job's total energy is E = T 0 (P st + P dyn )dt. An increase in f decreases the run time T , so the static component of the total energy decreases while the dynamic component may increase. Using a few test runs with typical input data sets, one tries to gain an impression of the nature of this dependence. For a specific example, Fig. 1 shows the result.As expected, a moderate reduction of the clock frequency decreases the energy consumption but also significantly increases the run time. A straightforward application of the idea thus implies that the available hardware cannot be fully utilized, i. e. the number of program runs that can be executed during the system's life cycle is lower than it could be. To justify the hardware investment, it is thus not reasonable to focus only on energy. A more useful metric is the energy delay product EDP = E · T w where E is energy, T is run time, and w is a parameter weighting energy and run time according to the policy of the specific HPC center; typical values are w ∈ {1, 2, 3}. An optimization with respect to such a metric leads to a strategy that permits using higher frequencies in spite of possibly larger energy requirements if the run time savings are sufficien...