PALMED: Throughput Characterization for Superscalar Architectures

Derumigny, Nicolas; Bastian, Théophile; Gruber, Fabian; Iooss, Guillaume; Guillon, Christophe; Pouchet, Louis-Noël; Rastello, Fabrice

doi:10.1109/cgo53902.2022.9741289

Cited by 3 publications

(6 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With a throughput of 6 IPC and up to 4 ports per µop [15, Chapters 23-24], Zen3 and Zen4 could therefore be handled similarly as the previous Zen generations. 7 PMEvo [29] and Palmed [14] as points of comparison in Section 4.5. Since Zen+ does not have full per-port µop counters, the original uops.info algorithm is not applicable.…”

Section: Case Study: the Amd Zen+ Architecturementioning

confidence: 99%

“…We evaluate our Zen+ port mapping quantitatively by comparing its throughput prediction accuracy against PMEvo [29] and Palmed [14]. 9 As port mappings model only the use of functional units, we focus on instruction sequences whose throughput is not limited by data dependencies.…”

Section: Prediction Accuracy -Port Mappingmentioning

confidence: 99%

“…In contrast to the uops.info algorithm, PMEvo's evolutionary algorithm cannot provide explanatory microbenchmarks to bolster confidence in the results. Palmed [14] infers conjunctive resource mappings with good performance prediction results, but they do not map directly to the microarchitecture and therefore do not fit into existing tools.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

Ritter,

Hack

2024

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,

View full text Add to dashboard Cite

Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are available, whereas information for AMD's designs are lacking. The reason for this disparity is that standard techniques to infer exact port mappings require hardware performance counters that AMD does not provide.In this work, we modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters. The modifications are based on a formal port mapping model with a counter-exampleguided algorithm powered by an SMT solver. We investigate in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping. Our results provide valuable insights for creators of CPU performance models as well as for software developers who want to achieve peak performance on recent AMD CPUs.

show abstract

Section: Case Study: the Amd Zen+ Architecturementioning

confidence: 99%

Section: Prediction Accuracy -Port Mappingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

Ritter,

Hack

2024

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems,

View full text Add to dashboard Cite

show abstract

“…Approaches that infer models for throughput predictors are also evaluated against existing ones on a measured ground truth: PMEvo [Ritter and Hack 2020] and Palmed [Derumigny et al 2022] both use basic blocks without data dependencies, whose throughput is bound by the processor's functional units. For Palmed, the basic blocks mirror basic blocks observed in the binaries of benchmark suites (without the dependencies).…”

Section: Testing Throughput Predictorsmentioning

confidence: 99%

“…The AnICA algorithm can also be applied to subcomponents of performance models that affect only individual aspects of basic block throughput prediction. For instance, approaches like uops.info [Abel and Reineke 2019], PMEvo [Ritter and Hack 2020], and Palmed [Derumigny et al 2022] build models for how individual instructions use a CPU's execution resources. These models are able to predict the throughput of basic blocks without data dependencies.…”

Section: Possible Extensionsmentioning

confidence: 99%

AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers

Ritter¹,

Hack²

2022

Preprint

View full text Add to dashboard Cite

Microarchitectural code analyzers, i.e., tools that estimate the throughput of machine code basic blocks, are important utensils in the tool belt of performance engineers. Recent tools like llvm-mca, uiCA, and Ithemal use a variety of techniques and different models for their throughput predictions. When put to the test, it is common to see these state-of-the-art tools give very different results. These inconsistencies are either errors, or they point to different and rarely documented assumptions made by the tool designers.In this paper, we present AnICA, a tool taking inspiration from differential testing and abstract interpretation to systematically analyze inconsistencies among these code analyzers. Our evaluation shows that AnICA can summarize thousands of inconsistencies in a few dozen descriptions that directly lead to high-level insights into the different behavior of the tools. In several case studies, we further demonstrate how AnICA automatically finds and characterizes known and unknown bugs in llvm-mca, as well as a quirk in AMD's Zen microarchitectures.CCS Concepts: • Software and its engineering → Correctness; Software verification and validation; Software testing and debugging; • Theory of computation → Abstraction.

show abstract

AnICA: analyzing inconsistencies in microarchitectural code analyzers

Ritter

Hack

2022

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

Microarchitectural code analyzers, i.e., tools that estimate the throughput of machine code basic blocks, are important utensils in the tool belt of performance engineers. Recent tools like llvm-mca, uiCA, and Ithemal use a variety of techniques and different models for their throughput predictions. When put to the test, it is common to see these state-of-the-art tools give very different results. These inconsistencies are either errors, or they point to different and rarely documented assumptions made by the tool designers. In this paper, we present AnICA, a tool taking inspiration from differential testing and abstract interpretation to systematically analyze inconsistencies among these code analyzers. Our evaluation shows that AnICA can summarize thousands of inconsistencies in a few dozen descriptions that directly lead to high-level insights into the different behavior of the tools. In several case studies, we further demonstrate how AnICA automatically finds and characterizes known and unknown bugs in llvm-mca, as well as a quirk in AMD's Zen microarchitectures.

show abstract

PALMED: Throughput Characterization for Superscalar Architectures

Cited by 3 publications

References 21 publications

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures

AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers

AnICA: analyzing inconsistencies in microarchitectural code analyzers

Contact Info

Product

Resources

About