Proceedings of the 34th ACM International Conference on Supercomputing 2020
DOI: 10.1145/3392717.3392765
|View full text |Cite
|
Sign up to set email alerts
|

Modeling and optimizing NUMA effects and prefetching with machine learning

Abstract: Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HPC performance. Optimizing both together leads to a large and complex design space that has previously been impractical to explore at runtime. In this work we deliver the performance benefits of optimizing both NUMA thread/data placement and prefetcher configuration at runtime through careful modeling and online profiling. To address the large design space, we propose a prediction model that reduces the amount of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(21 citation statements)
references
References 44 publications
0
21
0
Order By: Relevance
“…Collecting counters from a single configuration is more straightforward as we do not need to change the NUMA/prefetch configuration across executions. However, collecting counters across configurations has higher tuning potential in the context of compiler [49] or NUMA/prefetch performance optimization [39]. By profiling the same counter across different configurations, we measure its to various contexts.…”
Section: Application Characteristics (Features)mentioning
confidence: 99%
See 1 more Smart Citation
“…Collecting counters from a single configuration is more straightforward as we do not need to change the NUMA/prefetch configuration across executions. However, collecting counters across configurations has higher tuning potential in the context of compiler [49] or NUMA/prefetch performance optimization [39]. By profiling the same counter across different configurations, we measure its to various contexts.…”
Section: Application Characteristics (Features)mentioning
confidence: 99%
“…By profiling the same counter across different configurations, we measure its to various contexts. This reaction is valuable information to guide optimizations, a so-called reaction-based profiling [39], [49]. We also consider the 1536 (768 * 2) performance and energy measurements as reaction-based features in our models.…”
Section: Application Characteristics (Features)mentioning
confidence: 99%
“…The latency and bandwidth in NUMA systems vary based on the nodes involved in data access and storage. Extensive research has been conducted on NUMA optimizations, which primarily revolves around strategically placing threads and data across nodes to minimize latency and maximize bandwidth [25][26][27][28][29]. The performance of a system is influenced by both hardware and software design choices that take into account the memory system architecture.…”
Section: Effect Of Numa Accessmentioning
confidence: 99%
“…The importance of considering NUMA and prefetching simultaneously to improve the performance of parallel applications has already been shown [134],…”
Section: Prefetchers For Numa Systemsmentioning
confidence: 99%
“…Performance counters and sampled executions can be used to build machine learning models that allow to choose combined configurations of NUMA and prefetchers at execution time for applications not in the model [134].…”
Section: Using Models To Drive Configurationsmentioning
confidence: 99%