Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 2021
DOI: 10.1145/3453483.3454106
|View full text |Cite
|
Sign up to set email alerts
|

AKG: automatic kernel generation for neural processing units using polyhedral transformations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(20 citation statements)
references
References 72 publications
0
20
0
Order By: Relevance
“…Those passes have complex optimization rules for different domain-specific code structures (e.g., big loops, large buffer allocation, and thread scheduling) that general-purpose mutators can hardly target. Hence, according to the hot spot program patterns targeted by existing tensor compilers [Chen et al 2018;Ragan-Kelley et al 2013;Tillet et al 2019;Zhao et al 2021], Tzer specifically designed 3 types of mutators: 1) loop-nesting mutator for creating multifarious dense loop structures; 2) memory-operation mutator for various memory allocation/store/load patterns at the index level; and 3) thread-binding mutator for diversifying the parallel computation flows to generate interesting code patterns that tensor compilers particularly care about. Loop Nesting.…”
Section: Domain-specific Mutationmentioning
confidence: 99%
See 1 more Smart Citation
“…Those passes have complex optimization rules for different domain-specific code structures (e.g., big loops, large buffer allocation, and thread scheduling) that general-purpose mutators can hardly target. Hence, according to the hot spot program patterns targeted by existing tensor compilers [Chen et al 2018;Ragan-Kelley et al 2013;Tillet et al 2019;Zhao et al 2021], Tzer specifically designed 3 types of mutators: 1) loop-nesting mutator for creating multifarious dense loop structures; 2) memory-operation mutator for various memory allocation/store/load patterns at the index level; and 3) thread-binding mutator for diversifying the parallel computation flows to generate interesting code patterns that tensor compilers particularly care about. Loop Nesting.…”
Section: Domain-specific Mutationmentioning
confidence: 99%
“…However, hand-crafted optimization is time-consuming in the long run and a fixed binary cannot meet the ultimate performance requirements for all hardware vendors. Therefore, to fundamentally resolve those challenges, recently DL infrastructures have been focusing on developing tensor compilers [Chen et al 2018;Google 2016;Intel 2017;Jin et al 2020;Rotem et al 2018;Tillet et al 2019;Zhao et al 2021] to automatically generate best-in-class target code for different vendors or even architectures.…”
Section: Introductionmentioning
confidence: 99%
“…AutoTVM [10] Hand-written Templates + Tuning Ansor [68] Generation Rules + Tuning UNIT [58] Hand-written Templates XLA [18] Templates and Rules ISA Mapper [52] Templates and Rules + Tuning Tiramisu [4] Polyhedral Model AKG [67] Polyhedral Model + Templates AMOS Analyzable Abstraction + Tuning following two intrinsics are from Tensor Core WMMA. mma_sync is a matrix multiplication intrinsic (compute) and load_matrix_sync is a matrix load intrinsic (memory).…”
Section: Namementioning
confidence: 99%
“…For example, TVM [9] exposes a tensorize interface for users to configure their own intrinsics and the users have to manually invoke intrinsics when implementing the software. Polyhedral compilers such as AKG [67] relies on a combination of polyhedral model and templates to map software onto spatial accelerators. AutoTVM [10] and UNIT [58] use hand-tuned templates with intrinsics to support a narrow range of operators and accelerators.…”
Section: Existing Mapping Flowmentioning
confidence: 99%
See 1 more Smart Citation