Survey of code-size reduction methods

Beszedes, Arpad; Ferenć, Rudolf; Gyimóthy, Tibor; Dolenc, A.; Karsisto, Konsta

doi:10.1145/937503.937504

Cited by 77 publications

(46 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…RELATED WORK Most code density research addresses the compressibility of instruction code [36], [37], [38], [39], [40], [41], [42], [4], [43], [44], [45]. Usually what is compressed is compiler-generated RISC or VLIW code, with compression ratios typically in the 50-70% range.…”

Section: Density Of Compiler-generated Binariesmentioning

confidence: 99%

Code density concerns for new architectures

Weaver

McKee

2009

2009 IEEE International Conference on Computer Design

View full text Add to dashboard Cite

Abstract-Reducing a program's instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference in code sizes from ISA alone. We find that the architectural features that contribute most heavily to code density are instruction length, number of registers, availability of a zero register, bit-width, hardware divide units, number of instruction operands, and the availability of unaligned loads and stores.We extend our results to investigate operating system, compiler, and system library effects on code density. We find that the executable starting address, executable format, and system call interface all affect program size. While ISA effects are important, the efficiency of the entire system stack must be taken into account when developing a new dense instruction set architecture. I. BENEFITS OF CODE DENSITYDense code yields many benefits. The L1 instruction cache can hold more instructions, which usually results in fewer cache misses [1]. Less bandwidth is required to fetch instructions from memory and disk [2], and less storage is needed to hold program images. With fewer instructions, more data fits in a combined L2 cache. Also, on modern multi-threaded processors, multiple threads share limited L1 cache space, so having fewer instructions can be advantageous. Denser code causes fewer TLB misses, since the code requires fewer virtual memory pages. Modern Intel processors, for instance, can execute compact loops entirely from the instruction buffer, removing the need for L1 I-cache accesses. Finally, the ability to consistently generate denser code can conserve power, since it enables smaller microarchitectural structures and uses less bandwidthObviously, these benefits can come at a cost. For example, a denser ISA might require larger (and thus slower) pipeline decode stages, more complicated compilers, smaller logical register set sizes (due to limitations in the number of bits available in instructions), or even slower and more complex functional units. Compilers tend to optimize for performance, not size (even though the two are inextricably related): obtaining optimal code density often requires hand-tuned assembly language, which represents yet another tradeoff in terms of programmer time and maintainability. The current push for using CISC chips in the embedded market [8] forces a re-evaluation of existing ISAs. II. METHODOLOGYInvestigations of code density often use microbenchmarks (which tend to be short and not representative of actual workloads) or else industry standard benchmarks (which are written in high-level languages and thus are limited by compiler code generation capabilities). As a compromise, we take an actual system utility, but convert it into pure assembly language in order to...

show abstract

Section: Density Of Compiler-generated Binariesmentioning

confidence: 99%

Code density concerns for new architectures

Weaver

McKee

2009

2009 IEEE International Conference on Computer Design

View full text Add to dashboard Cite

show abstract

“…Code compaction work has also been dealing with the reduction of static code of a single binary [4,8,9]. Code compaction methods are used to reduce the executable code size without a need to decompress the compacted code to execute it.…”

Section: Related Workmentioning

confidence: 99%

Extrinsic and Intrinsic Text Cloning

Kleanthous

Sazeides

Dikaiakos

2011

Computer Architecture

View full text Add to dashboard Cite

Text Cloning occurs when a processor is storing in one or more levels of its cache hierarchy the same text multiple times. There are several causes of Text Cloning and we classify them either as Extrinsic or Intrinsic.Extrinsic Text Cloning can happen due to user and software practices, or middleware policies, which result into making multiple copies of a binary and concurrently executing the multiple copies on the same processor.Intrinsic Text Cloning can happen when an instruction cache is Virtually Indexed/Virtually Tagged and the process identifier is included in the tag. A simultaneous multithreaded processor, that employs such cache, will map the text of concurrent processes of the same binary to different instruction cache space due to their distinct process identifier.Text cloning can be wasteful to performance, especially for simultaneous multithreaded processors, because concurrent processes compete for cache space to store the same instruction blocks. Experimental results on simultaneous multithreaded processors indicate that the performance overhead of this type of undesirable cloning is significant. Theses findings call for OS and/or architectural support to reduce or eliminate Text Cloning.

show abstract

“…Code compression is the act of reducing the size of program code by its equivalent representation in another form [18]. It is usually applied to executables that run on embedded systems.…”

Section: Related Workmentioning

confidence: 99%

Profile-Driven Selective Program Loading

Ince

Hollingsworth

2010

Euro-Par 2010 - Parallel Processing

View full text Add to dashboard Cite

Abstract. Complex software systems use many shared libraries frequently composed of large off-the-shelf components. Only a limited number of functions are used from these shared libraries. Historically demand paging prevented this from wasting large amounts of memory. Many high end systems lack virtual memory and thus must load the entire shared library into each node's memory. In this paper we propose a system which decreases the memory footprint of applications by selectively loading only the used portions of the shared libraries. After profiling executables and shared libraries, our system rewrites all target shared libraries with a new function ordering and updated ELF program headers so that the loader only loads those functions that are likely to be used by a given application and includes a fallback user-level paging system to recover in the case of failures in our analysis. We present a case study that shows our system achieves more than 80% reduction in the number of pages that are loaded for several HPC applications while causing no performance overhead for reasonably long running programs.

show abstract

Survey of code-size reduction methods

Cited by 77 publications

References 41 publications

Code density concerns for new architectures

Code density concerns for new architectures

Extrinsic and Intrinsic Text Cloning

Profile-Driven Selective Program Loading

Contact Info

Product

Resources

About