Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs on that machine. We have developed a machineindependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines, and users in tuning their applications to better utilize the performance of existing machines.Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcomings in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics. † The material presented here is based on research supported principally by NASA under grant NCC2-550, and also in part by the National Science Foundation under grants MIP-8713274, MIP-9116578 and CCR-9117028,
IntroductionBenchmarking is the process of running a specific program or workload on a specific machine or system, and measuring the resulting performance. This technique clearly provides an accurate evaluation of the performance of that machine for that workload. These benchmarks can either be complete applications [UCB87, Dong88, MIPS89], the most executed parts of a program (kernels) [Bail85,McMa86, Dodu89], or synthetic programs [Curn76,Weic88]. Unfortunately, benchmarking fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other program on that machine [Worl84, Dong87]. This is because benchmarking fails to characterize either the program or machine. In this paper we show that these limitations can be overcome with the help of a performance model based on the concept of a high-level abstract machine.Our machine model consists of a set of abstract operations representing, for some particular programming language, the basic operators and language constructs present in programs. A special benchmark called a machine characterizer is used to measure experimentally the time it takes to execute each abstract operation (AbOp). Frequency counts of AbOps are obtained by instrumenting and running benchmarks. The ma...