2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2022
DOI: 10.1109/cgo53902.2022.9741289
|View full text |Cite
|
Sign up to set email alerts
|

PALMED: Throughput Characterization for Superscalar Architectures

Abstract: In a super-scalar architecture, the scheduler dynamically assigns micro-operations (µOPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into µOPs and lists for each µOP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop.This paper introduces a dual equivalent representation: The resource mapping … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…With a throughput of 6 IPC and up to 4 ports per µop [15, Chapters 23-24], Zen3 and Zen4 could therefore be handled similarly as the previous Zen generations. 7 PMEvo [29] and Palmed [14] as points of comparison in Section 4.5. Since Zen+ does not have full per-port µop counters, the original uops.info algorithm is not applicable.…”
Section: Case Study: the Amd Zen+ Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…With a throughput of 6 IPC and up to 4 ports per µop [15, Chapters 23-24], Zen3 and Zen4 could therefore be handled similarly as the previous Zen generations. 7 PMEvo [29] and Palmed [14] as points of comparison in Section 4.5. Since Zen+ does not have full per-port µop counters, the original uops.info algorithm is not applicable.…”
Section: Case Study: the Amd Zen+ Architecturementioning
confidence: 99%
“…We evaluate our Zen+ port mapping quantitatively by comparing its throughput prediction accuracy against PMEvo [29] and Palmed [14]. 9 As port mappings model only the use of functional units, we focus on instruction sequences whose throughput is not limited by data dependencies.…”
Section: Prediction Accuracy -Port Mappingmentioning
confidence: 99%
See 1 more Smart Citation
“…Approaches that infer models for throughput predictors are also evaluated against existing ones on a measured ground truth: PMEvo [Ritter and Hack 2020] and Palmed [Derumigny et al 2022] both use basic blocks without data dependencies, whose throughput is bound by the processor's functional units. For Palmed, the basic blocks mirror basic blocks observed in the binaries of benchmark suites (without the dependencies).…”
Section: Testing Throughput Predictorsmentioning
confidence: 99%
“…The AnICA algorithm can also be applied to subcomponents of performance models that affect only individual aspects of basic block throughput prediction. For instance, approaches like uops.info [Abel and Reineke 2019], PMEvo [Ritter and Hack 2020], and Palmed [Derumigny et al 2022] build models for how individual instructions use a CPU's execution resources. These models are able to predict the throughput of basic blocks without data dependencies.…”
Section: Possible Extensionsmentioning
confidence: 99%