Proceedings International Parallel and Distributed Processing Symposium
DOI: 10.1109/ipdps.2003.1213517
|View full text |Cite
|
Sign up to set email alerts
|

Experiences and lessons learned with a portable interface to hardware performance counters

Abstract: The PAPI project has defined and implemented a crossplatform interface to the hardware counters available on most modern microprocessors. The interface has gained widespread use and acceptance from hardware vendors, users, and tool developers. This paper reports on experiences with the community-based open-source effort to define the PAPI specification and implement it on a variety of platforms. Collaborations with tool developers who have incorporated support for PAPI are described. Issues related to interpre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(25 citation statements)
references
References 9 publications
0
25
0
Order By: Relevance
“…It just reports one single number. Also in the context of PAPI, Dongarra et al [3] mention potential sources of inaccuracy in counter measurements. They point out issues such as the extra instructions and system calls required to access counters, and indirect effects like the pollution of caches due to instrumentation code, but they do not present any experimental data.…”
Section: Related Workmentioning
confidence: 99%
“…It just reports one single number. Also in the context of PAPI, Dongarra et al [3] mention potential sources of inaccuracy in counter measurements. They point out issues such as the extra instructions and system calls required to access counters, and indirect effects like the pollution of caches due to instrumentation code, but they do not present any experimental data.…”
Section: Related Workmentioning
confidence: 99%
“…≥ 2.5). To monitor the performance of code sections, we use PAPI [22] which provides an interface to control and access the processor hardware performance counters.…”
Section: Analysis and Instrumentation Frameworkmentioning
confidence: 99%
“…MR does up to 66x fewer flops than DC, and never more than twice as many, with a median of 8.3x fewer flops than DC for large matrices. Figure 4.3 shows the flop counts of each algorithm relative to the one of MR. PAPI [15] was used to obtain the flop counts. Runtime of all algorithms on Opteron.…”
Section: Performance Details For Practicalmentioning
confidence: 99%