2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2022
DOI: 10.1109/cgo53902.2022.9741270
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Horizontal Fusion for GPU Kernels

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Automatic fusion of GPU kernels is a known optimization technique used for accelerating many scientific [12][13][14] and deep learning applications, 15,16 and is always bound to some compiler technologies. Aggregating knowledge from other studies, 17,18 we can formulate three distinct reasons for GPU kernel fusion: (1) to achieve better instruction latency hiding by fusing two data independent kernels that require different kinds of GPU resources; (2) to eliminate intermediate data round trips by fusing neighboring data depended kernels; (3) to reduce energy consumption and thus to improve GPU power efficiency. It is worth pointing out that reason (2) is the most common because many GPU kernels are memory-bound and data dependent.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Automatic fusion of GPU kernels is a known optimization technique used for accelerating many scientific [12][13][14] and deep learning applications, 15,16 and is always bound to some compiler technologies. Aggregating knowledge from other studies, 17,18 we can formulate three distinct reasons for GPU kernel fusion: (1) to achieve better instruction latency hiding by fusing two data independent kernels that require different kinds of GPU resources; (2) to eliminate intermediate data round trips by fusing neighboring data depended kernels; (3) to reduce energy consumption and thus to improve GPU power efficiency. It is worth pointing out that reason (2) is the most common because many GPU kernels are memory-bound and data dependent.…”
Section: Related Workmentioning
confidence: 99%
“…This typically leads to many possible combinations. The algorithms for finding the best substitution graph are different and can be based on either some rules, 15,16 or empirical searches, 13 or exhaustive searches coupled with automatic benchmarking 17 and performance models for pruning search spaces, 12 or dynamic programming. 18 Due to specifics of the ADER-DG method, we are using a greedy approach and trying to fuse the longest sequence of batched GEMMs kernels extracted from streams of YATeTo's instructions using a simple Finite Automata (see Section 5.3).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…ALT addresses the two limitations via 1) the generic layout transformation submodule, which requires no re-implementation, and is also independent of the loop transformation to achieve the decoupling; 2) an autotuning module at a higher level to orchestrate the cross-layer joint tuning while guaranteeing efficiency. As for recent loop optimization techniques [2,3,5,21,42,65,66,73,78,80,85,[89][90][91], such as delicate cost models [3,5,42,73], aggressive operator fusion [21,40,46,50,80,90], and micro-kernel construction [91], they are complementary to ALT.…”
Section: Related Workmentioning
confidence: 99%
“…Kernel fusion, referred to as operator fusion in the context of neural networks, has become a common technique to improve the performance of neural networks [27], and linear algebra [11]. Despite extensive research on this topic, [17,4], effectiveness of kernel fusion highly depends on various prior and subsequent optimizations, which will be focused in our study.…”
Section: Introductionmentioning
confidence: 99%