Abstract. Over the life of a modern computer, the energy cost of running the system can exceed the cost of the original hardware purchase. This has driven the community to attempt to understand and minimize energy costs wherever possible. Towards these ends, we present an automated, fine-grained approach to selecting per-loop processor clock frequencies. The clock frequency selection criteria is established through a combination of lightweight static analysis and runtime tracing that automatically acquires application signatures -characterizations of the patterns of execution of each loop in an application. This application characterization is matched with a series of benchmark loops, which have been run on the target system and exercise it various ways. These benchmarks are intended to form a covering set, a machine characterization of the expected power consumption and performance traits of the machine over the space of execution patterns and clock frequencies. The frequency that confers the best power-delay product to the benchmark that most closely resembles each application loop is the one chosen for that loop. The application's frequency management strategy is then permanently integrated into the compiled executable via static binary instrumentation. This process is lightweight, only has to be done once per application (and the benchmarks just once per machine), and thus is much less laborious than running every application loop at every possible frequency on the machine to see what the optimal frequencies would be. Unlike most frequency management schemes, we toggle frequencies very frequently, potentially at every loop entry and exit, saving as much as 10% of the energy bill in the process. The set of tools that implement this scheme is fully automated, built on top of freely available open source software, and uses an inexpensive power measurement apparatus. We use these tools to show a measured, system-wide energy savings of up to 7.6% on an 8-core Intel Xeon E5530 and 10.6% on a 32-core AMD Opteron 8380 (a Sun X4600 Node) across a range of workloads.