A performance counter architecture for computing accurate CPI components

Eyerman, Stijn; Eeckhout, Lieven; Karkhanis, Tejas S.; Smith, James E.

doi:10.1145/1168918.1168880

Cited by 58 publications

(90 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Power5's 2-way SMT feature does a relatively good job of keeping the pipeline busy, however instruction cache misses still account for a significant fraction of stall cycles (12%) for both of the WebSphere J2EE applications. It has been shown that the Power5 counter mechanism actually underestimates the performance penalty of icache misses [13]. Consequently, we consider this estimate a conservative lower bound.…”

Section: Stall Cyclesmentioning

confidence: 99%

Call-chain Software Instruction Prefetching in J2EE Server Applications

Nagpurkar¹,

Cain²,

Serrano³

et al. 2007

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)

View full text Add to dashboard Cite

We present a detailed characterization of instruction cache performance for IBM's J2EE-enabled web server, WebSphere Application Server (WAS). When running two J2EE benchmarks on WebSphere, we find that instruction cache misses cause a 12% performance penalty on currentgeneration Power5-based multiprocessor systems. To mitigate this performance loss, we describe a new call-chain based algorithm for inserting software prefetch instructions, and evaluate its potential for improved instruction cache performance. The performance of this algorithm depends on the selection of several independent parameters which control the distance and number of prefetches inserted for a particular method. We select these parameters through characterization of the WebSphere applications, and ultimately find that our call-chain based insertion algorithm achieves significant reduction in instruction cache miss rate for Java methods.

show abstract

Section: Stall Cyclesmentioning

confidence: 99%

Call-chain Software Instruction Prefetching in J2EE Server Applications

Nagpurkar¹,

Cain²,

Serrano³

et al. 2007

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)

View full text Add to dashboard Cite

show abstract

“…For example, a load instruction can miss in the data cache a few cycles after a branch is mispredicted. However, it has been observed (and we confirmed) that overlapping between different types of miss-events is rare enough that ignoring it results in negligible error in typical applications [19], [12]. This paper focuses on improving the accuracy of the modeled CP I D$miss (i.e., CPI component due to long latency data cache misses) since it is the component with the largest error in prior first-order models [18], [19].…”

Section: Background: First-order Modelmentioning

confidence: 64%

“…We compare against a cycle accurate simulator rather than real hardware to validate our models since a simulator provides insights that would be challenging to obtain without changes to currently deployed superscalar performance counter hardware [12]. We believe the most important factor is comparing two or more competing (hybrid) analytical models against a single detailed simulator provided the latter captures the behavior one wishes to model analytically.…”

Section: Methodsmentioning

confidence: 99%

Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

Chen

Aamodt

2008

2008 41st IEEE/ACM International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

“…A recent paper [21] proposes a new cycle accounting architecture for SMT processors based on estimating the CPI stack of each running task [22]. This proposal tracks fifteen different components of the CPI stack with a dedicated hardware.…”

Section: Related Workmentioning

confidence: 99%

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs

Luque

Moretó

Cazorla

et al. 2009

2009 18th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

Abstract-Chip-MultiProcessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities when accounting CPU utilization. This is due to the fact that the progress done by an application during an interval of time highly depends on the activity of the other applications it is co-scheduled with.In this paper, we identify how an inaccurate measurement of the CPU utilization affects several key aspects of the system like the application scheduling or the charging mechanism in data centers. We propose a new hardware CPU accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with the previous accounting mechanisms. Our results show that currently known mechanisms lead to a 19% average error when it comes to CPU utilization accounting. Our proposal reduces this error to less than 1% in a modeled 4-core processor system.

show abstract

A performance counter architecture for computing accurate CPI components

Cited by 58 publications

References 16 publications

Call-chain Software Instruction Prefetching in J2EE Server Applications

Call-chain Software Instruction Prefetching in J2EE Server Applications

Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs

Contact Info

Product

Resources

About