50th International Conference on Parallel Processing 2021
DOI: 10.1145/3472456.3473517
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Application-specific performance models [32,41,60] introduce domain knowledge into the prediction and often use the generated communication or performance model to inform an optimization search without executing the program, which might be expensive due to running on distributed environments.…”
Section: Related Workmentioning
confidence: 99%
“…Application-specific performance models [32,41,60] introduce domain knowledge into the prediction and often use the generated communication or performance model to inform an optimization search without executing the program, which might be expensive due to running on distributed environments.…”
Section: Related Workmentioning
confidence: 99%
“…Existing open-source RLHF frameworks such as Transformer Reinforcement Learning (TRL), Colos-salChat (CAIChat), and DeepSpeed-Chat (DSChat) rely on parallelization approaches like Zero Redundancy Optimizer (ZeRO) to co-locate the four models involved in RLHF training on the same GPU [14,28,20]. However, as models continue to grow past 70 billion parameters, this scheduling approach becomes increasingly inefficient with limited GPU memory.…”
Section: Introductionmentioning
confidence: 99%
“…For the given dimension, the distance between the center point and its farthest neighbor is denoted as the radius of the stencil. Due to the intrinsic nature, stencil computation often suffers from low memory bandwidth and poor locality [40] on modern processors, which makes it notorious for performance optimization [18,20,44].…”
mentioning
confidence: 99%