Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors

Tian, Xin; Saito, Hideki; Girkar, Milind; Preis, S.; Kozhukhov, Sergey S.; Cherkasov, Aleksei G.; Nelson, Clark; Panchenko, N. V.; Geva, Robert

doi:10.1109/ipdpsw.2012.292

Cited by 20 publications

(19 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In some cases, loop transformations may eliminate control flow differences [Hanxleden and Kennedy, 1992]. It's also possible for control flow to be consistent across data elements, but for there to be challenges in determining this, such as function calls; sophisticated analysis may be required in these situations [Tian et al, 2012]. e compiler may instead use some conditional SIMD operations to convert control flow into data flow, enabling SIMD execution at the cost of some efficiency [Bik et al, 2002].…”

Section: Determining If All Data Items Have the Same Control Flowmentioning

confidence: 99%

Single-Instruction Multiple-Data Execution

Hughes

2015

Synthesis Lectures on Computer Architecture

View full text Add to dashboard Cite

show abstract

Section: Determining If All Data Items Have the Same Control Flowmentioning

confidence: 99%

Single-Instruction Multiple-Data Execution

Hughes

2015

Synthesis Lectures on Computer Architecture

View full text Add to dashboard Cite

show abstract

“…Details about the specific compiler techniques are available in Tian et al 26 The compiler directives we add to the code and command line are the following:…”

Section: Bridging the Ninja Gapmentioning

confidence: 99%

Can traditional programming bridge the ninja performance gap for parallel computing applications?

et al. 2015

Self Cite

View full text Add to dashboard Cite

“…We focus on aggressive inter-iteration parallelism, consistent with presence of wide SIMD lanes. Tian et al [26] provided an extension to the current directive vectorization methods to support function call. ISPC [22] provides a compiler based solution supporting function calls, SOA data structure, and control flow.…”

Section: Related Workmentioning

confidence: 99%

A programming system for xeon phis with runtime SIMD parallelization

Huo

Ren

Agrawal

2014

Proceedings of the 28th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

The Intel Xeon Phi offers a promising solution to coprocessing, since it is based on the popular x86 instruction set. However, to fully utilize its potential, applications must be vectorized to leverage the wide SIMD lanes, in addition to effective large-scale shared memory parallelism. Compared to the SIMT execution model on GPGPUs with CUDA or OpenCL, SIMD parallelism with a SSE-like instruction set imposes many restrictions, and has generally not benefitted applications involving branches, irregular accesses, or even reductions in the past. In this paper, we consider the problem of accelerating applications involving different communication patterns on Xeon Phis, with an emphasis on effectively using available SIMD parallelism. We offer an API for both shared memory and SIMD parallelization, and demonstrate its implementation. We use implementations of overloaded functions as a mechanism for providing SIMD code, which is assisted by runtime data reordering and our methods to effectively manage control flow. Our extensive evaluation with 6 popular applications shows large gains over the SIMD parallelization achieved by the production (ICC) compiler, and we even outperform OpenMP for MIMD parallelism.

show abstract

Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors

Cited by 20 publications

References 7 publications

Single-Instruction Multiple-Data Execution

Single-Instruction Multiple-Data Execution

Can traditional programming bridge the ninja performance gap for parallel computing applications?

A programming system for xeon phis with runtime SIMD parallelization

Contact Info

Product

Resources

About