Abstract. The power wall has become a dominant impeding factor in the realm of exascale system design. It is therefore important to understand how to most effectively create application software in order to minimize its power usage while maintaining satisfactory levels of performance. In this work, we use existing software and hardware facilities in order to tune applications to minimize for several combinations of power and performance. The tuning is done with respect to software level performance-related tunables (cache tiling factors and loop unrolling factors) as well as for processor clock frequency. These tunable parameters are explored via an offline search in order to find the parameter combinations that are optimal with respect to performance (or delay, D), energy (E), energy×delay (E × D) and energy×delay×delay (E × D 2 ). These searches are employed on a parallel application that solves Poisson's equation using stencil computations. Stencil (nearestneighbor) computations are very common operations in today's scientific applications. We show that the parameter configuration that minimizes energy consumption can save, on average, 5.4% energy with a performance loss of 4% when compared to the configuration that minimizes runtime. Furthermore, with the work presented in this paper, we provide evidence for the existence of opportunities to auto-tune for energy in parallel applications.