2008
DOI: 10.1109/ipdps.2008.4536236
|View full text |Cite
|
Sign up to set email alerts
|

Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes

Abstract: Emerging petascale systems will have many hundreds of thousands of processors, but traditional task-level tracing tools already fail to scale to much smaller systems because the I/O backbones of these systems cannot handle the peak load offered by their cores. Complete event traces of all processes are thus infeasible. To retain the benefits of detailed performance measurement while reducing volume of collected data, we developed AMPL, a general-purpose toolkit that reduces data volume using stratified samplin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
5

Relationship

3
7

Authors

Journals

citations
Cited by 14 publications
(17 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…We demonstrate its utility by clustering performance trace data. Prior work showed that statistical sampling could reduce the volume of performance-trace data by over an order of magnitude on comparatively small systems for performance clusters that are known a priori [10,27]. Using our algorithm, we are able to use clustering information to stratify on-line performance traces adaptively, and we achieve data reduction of four orders of magnitude for much larger systems.…”
Section: Introductionmentioning
confidence: 93%
“…We demonstrate its utility by clustering performance trace data. Prior work showed that statistical sampling could reduce the volume of performance-trace data by over an order of magnitude on comparatively small systems for performance clusters that are known a priori [10,27]. Using our algorithm, we are able to use clustering information to stratify on-line performance traces adaptively, and we achieve data reduction of four orders of magnitude for much larger systems.…”
Section: Introductionmentioning
confidence: 93%
“…HPCToolkit collects call path profiles [9,1]. To further reduce the overhead involved in profiling, Gamblin et al utilize statistical sampling and parallel clustering techniques to reduce the number of parallel processes from which performance data is collected, and thus improve the scalability of parallel profiling tools [12,11,10]. In contrast to the lossless tracing approach, tools like mpiP generally report simple and high-level information that is only suitable for a superficial understanding of performance problems.…”
Section: Related Workmentioning
confidence: 99%
“…Also, ScalaTrace does not involve inter-thread compression. Trace compression discussed in [8] is based on statistical sampling and results in lossy compression and do not preserve order.…”
Section: Related Workmentioning
confidence: 99%