The Tau Parallel Performance System

Shende, Sameer; Malony, Allen D.

doi:10.1177/1094342006064482

Cited by 933 publications

(471 citation statements)

References 24 publications

Supporting

Mentioning

468

Contrasting

Unclassified

Order By: Relevance

“…However, such information is generally about the whole job, and more finegrained information would be helpful to understand the individual steps of a large parallel workflow. Alternatively, the workflow management system could record the performance information of each step of a workflow [24], a profiler may be used to automatically capture detailed performance information [31], or the user may instrument selected operations with some library functions [34]. In these cases, the performance data is typically captured into log files.…”

Section: Related Workmentioning

confidence: 99%

“…These performance tools lack distributing and parallelizing the computations of the analysis to large number of machines. Some tools such as Tau [31] and Vampir [9] can parallelize computational loads MPI processes, and potentially these MPI processes can be extended to distribute multiple loads. However, this extension involves significant implementation challenges due to synchronization and inter-process communication complexities and lack of fault tolerance support.…”

Section: Related Workmentioning

confidence: 99%

“…Shende et al [31] designed Tau to support monitoring parallel applications by automatically inserting instrumentation routines. Böhme et al [7] presented an automatic mechanism which performs instrumentation during compilation in order to identify the causes of waiting periods for MPI applications.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

Yoo

Koo

Cao

et al. 2016

Conquering Big Data With High Performance Computing

View full text Add to dashboard Cite

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

Yoo

Koo

Cao

et al. 2016

Conquering Big Data With High Performance Computing

View full text Add to dashboard Cite

show abstract

“…Outside of special network topology considerations on BlueGene/Q, CTF does not employ any optimizations which are specific to an architecture or an instruction set. Performance profiling is done by hand and with TAU [48].…”

Section: Application Performancementioning

confidence: 99%

A massively parallel tensor contraction framework for coupled-cluster computations

Solomonik

Matthews

Hammond

et al. 2014

Journal of Parallel and Distributed Computing

176

164

View full text Add to dashboard Cite

“…These tools very often support the optimization of the mapping process by tracing or profiling the applications during run-time. Examples of these tools are TAU [24], HPC Toolkit [25], Open|Speedshop [26], and Scalasca [27]. Since the optimization is done at run-time these tools usually require that the implementation of the algorithm is completed before the analysis.…”

mentioning

confidence: 99%

Model-Driven Approach for Supporting the Mapping of Parallel Algorithms to Parallel Computing Platforms

Arkin

Tekinerdoğan

İmre

2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The trend from single processor to parallel computer architectures has increased the importance of parallel computing. To support parallel computing it is important to map parallel algorithms to a computing platform that consists of multiple parallel processing nodes. In general different alternative mappings can be defined that perform differently with respect to the quality requirements for power consumption, efficiency and memory usage. The mapping process can be carried out manually for platforms with a limited number of processing nodes. However, for exascale computing in which hundreds of thousands of processing nodes are applied, the mapping process soon becomes intractable. To assist the parallel computing engineer we provide a model-driven approach to analyze, model, and select feasible mappings. We describe the developed toolset that implements the corresponding approach together with the required metamodels and model transformations. We illustrate our approach for the well-known complete exchange algorithm in parallel computing.

show abstract

The Tau Parallel Performance System

Cited by 933 publications

References 24 publications

Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

A massively parallel tensor contraction framework for coupled-cluster computations

Model-Driven Approach for Supporting the Mapping of Parallel Algorithms to Parallel Computing Platforms

Contact Info

Product

Resources

About