Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes

Gamblin, Todd; Fowler, Rob; Reed, Daniel A.

doi:10.1109/ipdps.2008.4536236

Cited by 14 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We demonstrate its utility by clustering performance trace data. Prior work showed that statistical sampling could reduce the volume of performance-trace data by over an order of magnitude on comparatively small systems for performance clusters that are known a priori [10,27]. Using our algorithm, we are able to use clustering information to stratify on-line performance traces adaptively, and we achieve data reduction of four orders of magnitude for much larger systems.…”

Section: Introductionmentioning

confidence: 93%

Clustering performance data efficiently at massive scales

Gamblin

Supinski

Schulz

et al. 2010

Proceedings of the 24th ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune their applications and to exploit these systems fully. However, extreme scales pose unique challenges for performance-tuning tools, which can generate significant volumes of I/O. Compute-to-I/O ratios have increased drastically as systems have grown, and the I/O systems of large machines can handle the peak load from only a small fraction of cores. Tool developers need efficient techniques to analyze and to reduce performance data from large numbers of cores.We introduce CAPEK, a novel parallel clustering algorithm that enables in-situ analysis of performance data at run time. Our algorithm scales sub-linearly to 131,072 processes, running in less than one second even at that scale, which is fast enough for on-line use in production runs. The CAPEK implementation is fully generic and can be used for many types of analysis. We demonstrate its application to statistical trace sampling. Specifically, we use our algorithm to compute efficiently stratified sampling strategies for traces at run time. We show that such stratification can result in data-volume reduction of up to four orders of magnitude on current large-scale systems, with potential for greater reductions for future extreme-scale systems.

show abstract

Section: Introductionmentioning

confidence: 93%

Clustering performance data efficiently at massive scales

Gamblin

Supinski

Schulz

et al. 2010

Proceedings of the 24th ACM International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

show abstract

“…HPCToolkit collects call path profiles [9,1]. To further reduce the overhead involved in profiling, Gamblin et al utilize statistical sampling and parallel clustering techniques to reduce the number of parallel processes from which performance data is collected, and thus improve the scalability of parallel profiling tools [12,11,10]. In contrast to the lossless tracing approach, tools like mpiP generally report simple and high-level information that is only suitable for a superficial understanding of performance problems.…”

Section: Related Workmentioning

confidence: 99%

Elastic and scalable tracing and accurate replay of non-deterministic events

Mueller

2013

Proceedings of the 27th International ACM Conference on International Conference on Supercomputing

View full text Add to dashboard Cite

SCALATRACE represents the state-of-the-art of parallel application tracing for high performance computing (HPC). This paper presents SCALATRACE II, a next generation tracer that delivers even higher trace compression capability, even when events are not always regular. In this work, we contribute a spectrum of novel compression and replay techniques that are fundamentally different from our past approaches. SCALATRACE II features a redesigned low-level encoding scheme of trace data such that data elements are elastic and self-explanatory. With this new encoding scheme, trace compression is enhanced by introducing innovative intra-node and inter-node trace compression algorithms that guarantee high compression rates in a loop structure agnostic fashion. In practice, the improved compression scheme is particularly efficient for scientific codes that demonstrate inconsistent behavior across time steps and nodes. A novel approach is further contributed to probabilistically replay sequences of non-deterministic events. To assess the compression efficacy of SCALATRACE II, we conduct experiments not only with computational kernels but also a real-world application, the Parallel Ocean Program (POP). Compared to the first generation SCALATRACE, we observe key improvements on trace compression for benchmarks with inconsistent time step behavior and diverging task level behavior while retaining timing accuracy even under probabilistic replay.

show abstract

“…Also, ScalaTrace does not involve inter-thread compression. Trace compression discussed in [8] is based on statistical sampling and results in lossy compression and do not preserve order.…”

Section: Related Workmentioning

confidence: 99%

Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

Budanur

Mueller

Gamblin

2011

SIGMETRICS Perform. Eval. Rev.

Self Cite

View full text Add to dashboard Cite

Concurrency levels in large-scale supercomputers are rising exponentially, and shared-memory nodes with hundreds of cores and non-uniform memory access latencies are expected within the next decade. However, even current petascale systems with tens of cores per node suffer from memory bottlenecks. As core counts increase, memory issues will become critical for the performance of large-scale supercomputers. Trace analysis tools are thus vital for diagnosing the root causes of memory problems. However, existing memory tracing tools are expensive due to prohibitively large trace sizes, or they collect only statistical summaries and omit potentially valuable information.In this paper, we present ScalaMemTrace, a novel technique for collecting memory traces in a scalable manner. ScalaMemTrace builds on prior trace methods with aggressive compression techniques to allow lossless representation of memory traces for dense algebraic kernels, with nearconstant trace size irrespective of the problem size or the number of threads. We further introduce a replay mechanism for ScalaMemTrace traces, and discuss the results of our prototype implementation on the x86 64 architecture.

show abstract

Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes

Cited by 14 publications

References 12 publications

Clustering performance data efficiently at massive scales

Clustering performance data efficiently at massive scales

Elastic and scalable tracing and accurate replay of non-deterministic events

Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

Contact Info

Product

Resources

About