2012
DOI: 10.1007/978-3-642-28652-0_1
|View full text |Cite
|
Sign up to set email alerts
|

Improving Performance of OpenCL on CPUs

Abstract: Data-parallel languages like OpenCL and CUDA are an important means to exploit the computational power of today's computing devices. In this paper, we deal with two aspects of implementing such languages on CPUs: First, we present a static analysis and an accompanying optimization to exclude code regions from control-flow to dataflow conversion, which is the commonly used technique to leverage vector instruction sets. Second, we present a novel technique to implement barrier synchronization. We evaluate our te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2013
2013
2015
2015

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 47 publications
(27 citation statements)
references
References 15 publications
1
26
0
Order By: Relevance
“…A similar analysis has been proposed for optimizing OpenCL kernels for CPU (rather than GPU) performance [24]. The analyses were developed independently.…”
Section: Engineering Issues For Efficient Verificationmentioning
confidence: 99%
“…A similar analysis has been proposed for optimizing OpenCL kernels for CPU (rather than GPU) performance [24]. The analyses were developed independently.…”
Section: Engineering Issues For Efficient Verificationmentioning
confidence: 99%
“…However, these approaches have limitations, including the exclusion of instructions with side-effects from poly-path execution [18]. Karrenberg and Hack [12,13] propose compiler algorithms to map OpenCL kernels down to packed-SIMD units with explicit vector blend instructions.…”
Section: A Vector Machinesmentioning
confidence: 99%
“…Karrenberg and Hack [13] describe a similar static branchuniformity optimization to reduce register pressure while vectorizing OpenCL kernels for packed-SIMD units in x86 processors. Reducing register pressure is especially important on an x86 processor, as it only has a limited number of scalar and vector registers.…”
Section: Linearizing the Control Flowmentioning
confidence: 99%
See 1 more Smart Citation
“…It is initially designed for GPGPU architectures [4,6]. And it can also be mapped to general purpose CPUs efficiently [23]. Recent studies attempt to use OpenCL for more diverging target architectures, such as FPGA [18] and ASIP [21].…”
Section: Related Workmentioning
confidence: 99%