Abstract-Reducing a program's instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone. We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide units, number of instruction operands, and the availability of unaligned loads and stores.We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture.
I. BENEFITS OF CODE DENSITYDense code yields many benefits. The L1 instruction cache can hold more instructions, which usually results in fewer cache misses [1]. Less bandwidth is required to fetch instructions from memory and disk [2], and less storage is needed to hold program images. With fewer instructions, more data fits in a combined L2 cache. Also, on modern multi-threaded processors, multiple threads share limited L1 cache space, so having fewer instructions can be advantageous. Denser code causes fewer TLB misses, since the code requires fewer virtual memory pages. Modern Intel processors, for instance, can execute compact loops entirely from the instruction buffer, removing the need for L1 I-cache accesses. Finally, the ability to consistently generate denser code can conserve power, since it enables smaller microarchitectural structures and uses less bandwidthObviously, these benefits can come at a cost. For example, a denser ISA might require larger (and thus slower) pipeline decode stages, more complicated compilers, smaller logical register set sizes (due to limitations in the number of bits available in instructions), or even slower and more complex functional units. Compilers tend to optimize for performance, not size (even though the two are inextricably related): obtaining optimal code density often requires hand-tuned assembly language, which represents yet another tradeoff in terms of programmer time and maintainability. The current push for using CISC chips in the embedded market [8] forces a re-evaluation of existing ISAs.
II. METHODOLOGYInvestigations of code density often use microbenchmarks (which tend to be short and not representative of actual workloads) or else industry standard benchmarks (which are written in high-level languages and thus are limited by compiler code generation capabilities). As a compromise, we take an actual system utility, but convert it into pure assembly language in order to...