Proceedings of the Tenth International Symposium on Code Generation and Optimization 2012
DOI: 10.1145/2259016.2259020
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic compilation of data-parallel kernels for vector processors

Abstract: Modern processors enjoy augmented throughput and power efficiency through specialized functional units leveraged via instruction set extensions. These functional units accelerate performance for specific types of operations but must be programmed explicitly. Moreover, applications targeting these specialized units will not take advantage of future ISA extensions and tend not to be portable across multiple ISAs. As architecture designers increasingly rely on heterogeneity for performance improvements, the chall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…We assess the efficiency of one particular approach to software-based compaction-the execution manager of Kerr et al [2012]. At the current state of the art, this approach promises to most effectively eliminate control-flow divergence, because to the best of our knowledge, it is the approach that has the most freedom to rearrange threads.…”
Section: Control-flow Divergence Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…We assess the efficiency of one particular approach to software-based compaction-the execution manager of Kerr et al [2012]. At the current state of the art, this approach promises to most effectively eliminate control-flow divergence, because to the best of our knowledge, it is the approach that has the most freedom to rearrange threads.…”
Section: Control-flow Divergence Analysismentioning
confidence: 99%
“…(3) The execution manager [Kerr et al 2012] or its configuration could be an inappropriate choice for a baseline. Although unexpected (see the discussion in Section 2.1), other approaches (see Section 4) could exhibit better performance.…”
Section: Threats To Validitymentioning
confidence: 99%
“…Their technique finds parallelism at the level of work-item coalescing loops. Kerr et al [15] propose a similar technique for CUDA kernels. We leave as our future work implementing auto-vectorization techniques in our framework and evaluating their performance on ARM processors.…”
Section: Related Workmentioning
confidence: 99%
“…The prior studies proposed methods to compile and execute applications written in OpenCL [10,12,18] or other accelerator programming models such as CUDA [11,15,24] on multicore CPUs. But they target x86 processors, while our work focuses on ARM processors for embedded systems.…”
Section: Introductionmentioning
confidence: 99%
“…Kerr et al [14] implement a thread-invariant expression elimination pass, also based on [26]. The focus of their optimization pass is different than ours; they use common subexpression elimination on invariants after vectorization, whereas we allocate invariants to scalar register.…”
Section: Related Workmentioning
confidence: 99%