2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2020
DOI: 10.1109/pmbs51919.2020.00006
|View full text |Cite
|
Sign up to set email alerts
|

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…Actually, in the papers by Kreutzer et al [2014] and Almasri and Abu-Sufah [2020], we can find that almost no performance improvement by ELLPACK type kernels over the CSR kernel was obtained for sufficiently large matrices on standard multi-core CPUs. Here, it is worth noting that this tendency differs on many-core CPUs such as Intel Xeon Phi; the effectiveness of SpMV kernels using ELLPACK type formats was reported by Kreutzer et al [2014], Alappat et al [2020], and Nakajima et al [2021].…”
Section: Summary Of the Experimentsmentioning
confidence: 90%
“…Actually, in the papers by Kreutzer et al [2014] and Almasri and Abu-Sufah [2020], we can find that almost no performance improvement by ELLPACK type kernels over the CSR kernel was obtained for sufficiently large matrices on standard multi-core CPUs. Here, it is worth noting that this tendency differs on many-core CPUs such as Intel Xeon Phi; the effectiveness of SpMV kernels using ELLPACK type formats was reported by Kreutzer et al [2014], Alappat et al [2020], and Nakajima et al [2021].…”
Section: Summary Of the Experimentsmentioning
confidence: 90%
“…We have further improved the ECM machine model for the A64FX CPU introduced in [1] and showed its applicability to the Fugaku processor. We validated the model with simple streaming kernels and could observe a high accuracy for in-memory data sets.…”
Section: Discussionmentioning
confidence: 99%
“…More importantly, we have substantially increased the scope of both topics, e.g., by improving the ECM model considering the impact of page sizes and by presenting a detailed ECM model and performance-tuning strategies for SpMV. Topics presented here but not covered in [1] include the case study of the Lattice QCD kernel, the investigation of power-saving mechanisms and specific hardware features of the A64FX and the comparison with state-of-the-art CPUs and GPGPUs.…”
Section: Extended Version Of Workhop Short Papermentioning
confidence: 99%
“…benchmarks or other code optimization projects). The emerging pattern is that high speedups typically require much more involved optimization work, such as explicit loop unrolling and development of detailed performance models [25], which call for dedicate projects, when not dedicated staff; but can result in general optimization hints all users will benefit from.…”
Section: Test Run On A64fx Architecturementioning
confidence: 99%