Automatic construction of coordinated performance skeletons

Subhlok, Jaspal; Xu, Qiang

doi:10.1109/ipdps.2008.4536405

Cited by 8 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The steps in the skeleton construction procedure are outlined in Figure 2. A framework for scalable skeleton construction has been developed and the effectiveness of performance skeletons for performance prediction have been evaluated [19], [13].…”

Section: Motivation and Contextmentioning

confidence: 99%

Logicalization of communication traces from parallel execution

Subhlok

Zheng

et al. 2009

2009 IEEE International Symposium on Workload Characterization (IISWC)

Self Cite

View full text Add to dashboard Cite

Abstract-Communication traces are integral to performance modeling and analysis of parallel programs. However, execution on a large number of nodes results in a large trace volume that is cumbersome and expensive to analyze. This paper presents an automatic framework to convert all process traces corresponding to the parallel execution of an SPMD MPI program into a single logical trace. First, the application communication matrix is generated from process traces. Next, topology identification is performed based on the underlying communication structure and independent of the way ranks (or numbers) are assigned to processes. Finally, message exchanges between physical processes are converted into logical message exchanges that represent similar message exchanges across all processes, resulting in a trace volume reduction approximately equal to the number of processes executing the application. This logicalization framework has been implemented and the results report on its performance and effectiveness.

show abstract

Section: Motivation and Contextmentioning

confidence: 99%

Logicalization of communication traces from parallel execution

Subhlok

Zheng

et al. 2009

2009 IEEE International Symposium on Workload Characterization (IISWC)

Self Cite

View full text Add to dashboard Cite

show abstract

“…We refer to the SST documentation [25] for more information. We implemented the skeletons manually, automatic skeletonization is subject of ongoing research [35]. The SST network simulator models the most common MPI calls (including collective and point-to-point communication, as well as non-blocking MPI calls).…”

Section: Phase 3: Many-node Network Simulationmentioning

confidence: 99%

Projecting Performance for PIUMA using Down-Scaled Simulation

Eyerman¹,

Heirman²,

Demir³

et al. 2020

2020 IEEE High Performance Extreme Computing Conference (HPEC)

View full text Add to dashboard Cite

Accurate performance estimation of future many-node machines is challenging because it requires detailed simulation models of both node and network. However, simulating the full system in detail is unfeasible in terms of compute and memory resources. State-of-the-art techniques use a two-phase approach that combines detailed simulation of a single node with network-only simulation of the full system. We show that these techniques, where the detailed node simulation is done in isolation, are inaccurate because they ignore two important node-level effects: compute time variability, and inter-node communication.We propose a novel three-stage simulation method to allow scalable and accurate many-node simulation, combining native profiling, detailed node simulation and high-level network simulation. By including timing variability and the impact of external nodes, our method leads to more accurate estimates. We validate our technique against measurements on a multi-node cluster, and report an average 6.7% error on 64 nodes (maximum error of 12%), compared to on average 27% error and up to 54% when timing variability and the scaling overhead are ignored. At higher node counts, the prediction error of ignoring variable timings and scaling overhead continues to increase compared to our technique, and may lead to selecting the wrong optimal cluster configuration.Using our technique, we are able to accurately project performance to thousands of nodes within a day of simulation time, using only a single or a few simulation hosts. Our method can be used to quickly explore large many-node design spaces, including node micro-architecture, node count and network configuration.

show abstract

“…In addition to inferring models, concurrent system logs can be used to detect anomalies [35,44,57], identify performance bugs [52,53], and mine temporal system properties [9,17,59]. Our focus is on concurrency and on extracting a model that can aid understanding of more general system behavior.…”

Section: Related Workmentioning

confidence: 99%

Inferring models of concurrent systems from logs of their behavior with CSight

Beschastnikh

Brun

Ernst

et al. 2014

Proceedings of the 36th International Conference on Software Engineering

150

View full text Add to dashboard Cite

Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation.To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM).Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs.

show abstract

Automatic construction of coordinated performance skeletons

Cited by 8 publications

References 6 publications

Logicalization of communication traces from parallel execution

Logicalization of communication traces from parallel execution

Projecting Performance for PIUMA using Down-Scaled Simulation

Inferring models of concurrent systems from logs of their behavior with CSight

Contact Info

Product

Resources

About