2014 IEEE International Parallel &Amp; Distributed Processing Symposium Workshops 2014
DOI: 10.1109/ipdpsw.2014.115
|View full text |Cite
|
Sign up to set email alerts
|

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…NVIDIA's jitifiy [48] is a library that simplifies the use of CUDA Runtime Compilation (NVRTC). KernelGen [49] is a Fortran/C compiler that automates GPU code generation with polyhedral loop analysis of LLVM IR. Those works present dynamic features such as runtime alias analysis and parameter tuning alongside kernel specialization.…”
Section: Related Workmentioning
confidence: 99%
“…NVIDIA's jitifiy [48] is a library that simplifies the use of CUDA Runtime Compilation (NVRTC). KernelGen [49] is a Fortran/C compiler that automates GPU code generation with polyhedral loop analysis of LLVM IR. Those works present dynamic features such as runtime alias analysis and parameter tuning alongside kernel specialization.…”
Section: Related Workmentioning
confidence: 99%
“…Our implementation currently supports converting OpenMP code to HSTREAMS, CUDA and OpenCL programs. While we do not claim novelty on this as several works on source-to-source translation from OpenMP to CUDA [23], [24], [25], [26] or OpenCL [20], [27] exist, we believe the tool could serve as a useful utility for translating OpenMP programs to exploit multi-stream performance on heterogeneous many-core architectures. Figure 7 depicts our source to source code generator for translating OpenMP code to streamed programs.…”
Section: Openmp To Streamed Code Generatormentioning
confidence: 99%
“…CodeExtractor does a flow analysis to detect all the live-in and live-out dependencies of the region to extract [Mikushin et al 2013]. This pass simplifies the codelet extraction process, since it extracts the region code in its own function.…”
Section: Ir Capture and Replay Overviewmentioning
confidence: 99%