In recent years, on-chip trace generation has been recognized as a solution to the debugging of increasingly complex software. An
execution trace
can be seen as the most fundamentally useful type of trace, allowing the execution path of software to be determined post hoc. However, the bandwidth required to output such a trace can be excessive. Our architecture-aware trace compression (AATC) scheme adds an on-chip branch predictor and branch target buffer to reduce the volume of execution trace data in real time through on-chip compression. Novel redundancy reduction strategies are employed, most notably in exploiting the widespread use of
linked branches
and the compiler-driven movement of return addresses between link register, stack, and program counter. In doing so, the volume of
branch target addresses
is reduced by 52%, whereas other algorithmic improvements further decrease trace volume. An analysis of spatial and temporal redundancy in the trace stream allows a comparison of encoding strategies to be made for systematically increasing compression performance. A combination of differential, Fibonacci, VarLen, and Move-to-Front encodings are chosen to produce two compressor variants: a performance-focused xAATC that encodes 56.5 instructions/bit using 24,133 gates and an area-efficient fAATC that encodes 48.1 instructions/bit using only 9,854 gates.