2022
DOI: 10.1016/j.cpc.2021.108193
|View full text |Cite
|
Sign up to set email alerts
|

Cache blocking strategies applied to flux reconstruction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…Completely merging A and P À1 , as in the case of a diagonal matrix, is not straightforward in the case of ASM: it requires nesting of two cell loops (also known as "power kernel", see e.g., Malas et al (2017) and the related work of Akkurt et al (2022) for DG-type discretizations). Our preliminary investigations have shown that running the two cell loops in sequence is faster for higher-order elements, because the nested loops increase the active working set, resulting in the deterioration of the data locality of the outer loop particularly due to the MPI communication requirements.…”
Section: Data Locality In Chebyshev Iterationsmentioning
confidence: 99%
“…Completely merging A and P À1 , as in the case of a diagonal matrix, is not straightforward in the case of ASM: it requires nesting of two cell loops (also known as "power kernel", see e.g., Malas et al (2017) and the related work of Akkurt et al (2022) for DG-type discretizations). Our preliminary investigations have shown that running the two cell loops in sequence is faster for higher-order elements, because the nested loops increase the active working set, resulting in the deterioration of the data locality of the outer loop particularly due to the MPI communication requirements.…”
Section: Data Locality In Chebyshev Iterationsmentioning
confidence: 99%