Performance Evaluation and Benchmarking 2005
DOI: 10.1201/9781420037425.ch12
|View full text |Cite
|
Sign up to set email alerts
|

Performance Monitoring on the POWER5™ Microprocessor

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 3 publications
0
16
0
Order By: Relevance
“…Collecting CPI stacks on out-of-order cores is more complicated because of various overlap effects between miss events, e.g., a longlatency load may hide the latency of another independent long-latency load miss or mispredicted branch, etc. Recent commercial processors such as IBM Power5 [23] and Intel Sandy Bridge [12] however provide support for computing memory stall components. PIE scheduling also requires the number of LLC misses and the number of dynamically executed instructions, which can be measured using existing hardware performance counters.…”
Section: Hardware Supportmentioning
confidence: 99%
“…Collecting CPI stacks on out-of-order cores is more complicated because of various overlap effects between miss events, e.g., a longlatency load may hide the latency of another independent long-latency load miss or mispredicted branch, etc. Recent commercial processors such as IBM Power5 [23] and Intel Sandy Bridge [12] however provide support for computing memory stall components. PIE scheduling also requires the number of LLC misses and the number of dynamically executed instructions, which can be measured using existing hardware performance counters.…”
Section: Hardware Supportmentioning
confidence: 99%
“…For cores with multiple commit width, at each cycle, multiple counters can increase, each corresponding to a retirement slot. The mechanism described here is similar to the performance monitors in IBM POWER5 [32]; with the following extensions: depending on how the cache miss is served, the dCache is incremented differently. Details will be discussed in Section 4.1.1.…”
Section: Performance Profile With Hardware Performance Monitorsmentioning
confidence: 99%
“…Obtaining accurate execution time breakdowns in an out-of-order processor core is difficult due to the overlap of multiple on-the-fly instructions. Examining the instructions at the head of ROB gives us some clues [32] to the cause of a stall. In this section, we show how to obtain such execution time breakdowns for TLS execution.…”
Section: Performance Profile With Hardware Performance Monitorsmentioning
confidence: 99%
See 2 more Smart Citations