Tools for High Performance Computing 2016 2017
DOI: 10.1007/978-3-319-56702-0_1
|View full text |Cite
|
Sign up to set email alerts
|

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Abstract: Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an indepth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance model… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2

Relationship

4
4

Authors

Journals

citations
Cited by 25 publications
(28 citation statements)
references
References 13 publications
0
28
0
Order By: Relevance
“…If a dataset fits in the first-level cache, all accesses will behave the same and there is no need to consider the order and pattern of previous accesses or (possibly undisclosed) cache replacement algorithms. Behavior beyond L1 can be modeled separately, but this is beyond the scope of this work (the Kerncraft tool [4], which relies on an in-core analysis from IACA and -in the future -OSACA, combines it with data analysis for a unified Roofline or ECM prediction). 2) Multiple available ports per instruction are utilized with fixed probabilities.…”
Section: A Backgroundmentioning
confidence: 99%
See 1 more Smart Citation
“…If a dataset fits in the first-level cache, all accesses will behave the same and there is no need to consider the order and pattern of previous accesses or (possibly undisclosed) cache replacement algorithms. Behavior beyond L1 can be modeled separately, but this is beyond the scope of this work (the Kerncraft tool [4], which relies on an in-core analysis from IACA and -in the future -OSACA, combines it with data analysis for a unified Roofline or ECM prediction). 2) Multiple available ports per instruction are utilized with fixed probabilities.…”
Section: A Backgroundmentioning
confidence: 99%
“…Once known, the bottleneck can often be mitigated by changes in the code, the runtime parameters, or the execution environment. When the models' construction is automated [3], [4], compilers and a wider user base can take advantage of them.…”
Section: Introductionmentioning
confidence: 99%
“…By including information available from performance models for the different algorithms, the workload estimator can be made more general and flexible. Tools like Kerncraft [41] automatically analyze the performance of a given implementation for the hardware at hand, which would render the estimator independent of these factors. Furthermore, a workload estimate based on the current runtimes is a natural alternative to the proposed predictor as it is able to use actual data from the currently running simulation.…”
Section: Resultsmentioning
confidence: 99%
“…However, they require a deep understanding of the underlying micro-architecture in order to yield accurate results. Common (simplified) approaches for numerical kernels are the Roofline [1] model or the ECM [2] model, whose construction is supported by the Kerncraft open-source performance modeling tool [3]. For Roofline, the Roofline Model Toolkit [4] and Intel's Roofline Advisor 1 are also available.…”
Section: Introductionmentioning
confidence: 99%
“…With OSACA's semi-automatic benchmarking pipeline, compilers can benefit from an automated model construction [3], [4]. The instruction database is dynamically extendable, which enables users to adapt the tool to other application scenarios beyond numerical kernels found in HPC usecases.…”
Section: Introductionmentioning
confidence: 99%