SIGMA: A Simulator Infrastructure to Guide Memory Analysis

DeRose, Luiz; Ekanadham, Kattamuri; Hollingsworth, Jeffrey K.; Sbaraglia, Simone

doi:10.1109/sc.2002.10055

Cited by 40 publications

(35 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The compression algorithm maintains a queue of MPI events and attempts to greedily compress the first matching sequence, an approach that is loosely based on the SIGMA scheme for memory analysis [9]. Our algorithm uses two sequences, the "target" and the "match" sequence, each with its own head and tail pointer.…”

Section: Intra-node/task-level Trace Compressionmentioning

confidence: 99%

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Noeth

Ratn

Mueller

et al. 2009

Journal of Parallel and Distributed Computing

117

111

View full text Add to dashboard Cite

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and long execution times. While many tools to study this behavior have been developed, these approaches either aggregate information in a lossy way through high-level statistics or produce huge trace files that are hard to handle.We contribute an approach that provides orders of magnitude smaller, if not near-constant size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra-and inter-node compression techniques of MPI events that are capable of extracting an application's communication structure. We further present a replay mechanism for the traces generated by our approach and discuss results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and beyond. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with deterministic MPI call replay are without any precedent.Key words: High-Performance Computing, Scalability, Communication Tracing PACS: 07.05.Bx An earlier version of this paper appeared at IPDPS'07 [20]. This journal version extends the earlier paper by novel domain-specific intra-and inter-node compression techniques, a completely redesigned inter-node merge algorithm, novel results with a larger class of codes resulting in near-constant trace sizes, a study to identify the timestep loop and extended related work.

show abstract

Section: Intra-node/task-level Trace Compressionmentioning

confidence: 99%

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Noeth

Ratn

Mueller

et al. 2009

Journal of Parallel and Distributed Computing

117

111

View full text Add to dashboard Cite

show abstract

“…Constant and loop-varying addresses need to be encoded only once in the compressed trace, but all chaotic addresses must be stored separately. Control flow analysis to extract loop information can be avoided if a program is instrumented before tracing [DeRose et al 2002]. However, the limitation of this technique is that the iteration count for inner loops must be constant.…”

Section: Related Workmentioning

confidence: 99%

An efficient single-pass trace compression technique utilizing instruction streams

Milenković

2007

ACM Trans. Model. Comput. Simul.

View full text Add to dashboard Cite

Trace-driven simulations have been widely used in computer architecture for quantitative evaluations of new ideas and design prototypes. Efficient trace compression and fast decompression are crucial for contemporary workloads, as representative benchmarks grow in size and number. This article presents Stream-Based Compression (SBC), a novel technique for single-pass compression of address traces. The SBC technique compresses both instruction and data addresses by associating them with a particular instruction stream, that is, a block of consecutively executing instructions. The compressed instruction trace is a trace of instruction stream identifiers. The compressed data address trace encompasses the data address stride and the number of repetitions for each memoryreferencing instruction in a stream, ordered by the corresponding stream appearances in the trace. SBC reduces the size of SPEC CPU2000 Dinero instruction and data address traces from 18 to 309 times, outperforming the best trace compression techniques presented in the open literature. SBC can be successfully combined with general-purpose compression techniques. The combined SBC-gzip compression ratio is from 80 to 35,595, and the SBC-bzip2 compression ratio is from 75 to 191,257. Moreover, SBC outperforms other trace compression techniques when both decompression time and compression time are considered. This article also shows how the SBC algorithm can be modified for hardware implementation with very modest resources and only a minor loss in compression ratio.

show abstract

“…From previous studies with a variety of HPC applications [9,4,11,12,10,17], we have found that these five dimensions (CPU, memory, message passing, threads and I/O) provide an excellent starting point for a programmer to understand the performance behavior of their applications. The dimensions of performance data provided in our current framework are 1 :…”

Section: Overview Of the Productivity Cen-tered Framework For Applicamentioning

confidence: 99%

A Productivity Centered Application Performance Tuning Framework

Sbaraglia¹,

Wen²,

Seelam³

et al. 2007

Proceedings of the 2nd International ICST Conference on Performance Evaluation Methodologies and Tools

Self Cite

View full text Add to dashboard Cite

In response to the productivity challenge of the U.S. DARPA HPCS initiative, we have developed a methodology that provides an extremely simple and pain-free interface through which scientists can collect rich performance data from selected parts of an execution, digest the data at a very high level, and plan for improvements. This process can be easily repeated, each time refining the selection of parts of the application and revising the granularity of data collected, until complete insight is gained about bottlenecks. A distinct feature of our approach is that the framework is independent of the features being examined. Recognizing that the features to be examined change with systems/applications and also with depth at which an aspect is being examined, our framework provides an easy interface to continually add new features for examination. Furthermore, many different features can be collected simultaneously and examined in a non-interfering manner. Finally, all this is accomplished without changing the source code in any manner. We believe that this is an ideal platform for building knowledge-based repositories for automatic performance tuning, which is the subject of our future study.In this paper, we describe our productivity centered framework for application performance tuning. It comprises of three features: an unique source code and binary instrumentation feature, a versatile user-interface that brings all the sophisticated capabilities of the binary instrumentation to the user at a higher level of abstraction, and the functionality to collect different dimensions of performance data. The results of execution are all in terms of source level names and at no point does the scientist needs to worry about lowlevel details of instrumentation. We believe that it is this ability, of deciphering performance impacts at source level, that leads to high productivity of scientists to understand, direct and tune the behavior of the computing system.

show abstract

SIGMA: A Simulator Infrastructure to Guide Memory Analysis

Cited by 40 publications

References 19 publications

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

An efficient single-pass trace compression technique utilizing instruction streams

A Productivity Centered Application Performance Tuning Framework

Contact Info

Product

Resources

About