This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1-3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is being spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
Trace-driven simulation is often used in the design of computer systems, especially caches and translation Iookaside buffers. Capturing address traces to drive such simulations has been problematic, often involving 1000:1 software overhead to trace a target workload, and/or mechanisms that cause significant distortions in the recorded data. A new technique for capturing address traces has been developed to use a processor's micrecode to record addresses in a reserved part of main memory as a side effect of normal execution. An experimental implementation of this technique on a VAX 1 8200 processor shows a number of advantages over previous techniques, including fewer distortions of the address trace and a hundred times faster recording. With this technique, it is possible to gather full operating-system traces of multi-ta.~Irlng workloads.
This paper describes the DIGlTAL Continuous Profiling Infrastmcture, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executable& and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/secper333-MHz processor), yet with low overhead (l-3% slowdown for most workloads).Analysis tools supplied with the profiling system use the sample data to produce an accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is being spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.