Automatic parallelization for graphics processing units

Leung, Alan; Lhoták, Ondřej; Lashari, Ghulam Abbas

doi:10.1145/1596655.1596670

Cited by 27 publications

(25 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It will be even harder to extract the fine-grained parallelism necessary for efficient use of many-core systems like GPUs with thousands of threads. Therefore, several automatic static parallelization techniques for GPUs have been proposed to exploit more parallelism [Han and Abdelrahman 2010;Baskaran et al 2010;Wolfe 2010;Leung et al 2009;.…”

Section: Motivationmentioning

confidence: 99%

“…There are previous works that have focused on generating CUDA code from sequential input [Han and Abdelrahman 2010;Baskaran et al 2010;Wolfe 2010;Leung et al 2009;. HiCUDA [Han and Abdelrahman 2010] is a high-level directive-based compiler framework for CUDA programming where programmers need to insert directives into sequential C code to define the boundaries of kernel functions.…”

Section: Related Workmentioning

confidence: 99%

“…proposed accelerator, in which programmers use C# and a library to write their programs and let the compiler generate efficient GPU code. The work by Leung et al [2009] proposes an extension to a Java JIT compiler that executes program on the GPU. Delite [Chafi et al 2011] is another approach that aims at simplifying the creation of performance-oriented DSLs and compiling them for heterogeneous systems, including systems with GPUs.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Leveraging GPUs using cooperative loop speculation

Samadi

Hormati

Lee

et al. 2014

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Graphics processing units, or GPUs, provide TFLOPs of additional performance potential in commodity computer systems that frequently go unused by most applications. Even with the emergence of languages such as CUDA and OpenCL, programming GPUs remains a difficult challenge for a variety of reasons, including the inherent algorithmic characteristics and data structure choices used by applications as well as the tedious performance optimization cycle that is necessary to achieve high performance. The goal of this work is to increase the applicability of GPUs beyond CUDA/OpenCL to implicitly data-parallel applications written in C/C++ using speculative parallelization. To achieve this goal, we propose Paragon: a static/dynamic compiler platform to speculatively run possibly data-parallel portions of sequential applications on the GPU while cooperating with the system CPU. For such loops, Paragon utilizes the GPU in an opportunistic way while orchestrating a cooperative relation between the CPU and GPU to reduce the overhead of miss-speculations. Paragon monitors the dependencies for the loops running speculatively on the GPU and nonspeculatively on the CPU using a lightweight distributed conflict detection designed specifically for GPUs, and transfers the execution to the CPU in case a conflict is detected. Paragon resumes the execution on the GPU after the CPU resolves the dependency. Our experiments show that Paragon achieves 4x on average and up to 30x speedup compared to unsafe CPU execution with four threads and 7x on average and up to 64x speedup versus sequential execution across a set of sequential but implicitly data-parallel applications.

show abstract

Section: Motivationmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Leveraging GPUs using cooperative loop speculation

Samadi

Hormati

Lee

et al. 2014

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…Some of the more recent such research aims to effectively employ powerful dedicated and specialized co-processors like graphic cards. Notable VM designs in this direction are the CellVM [15], [30] for the Cell Broadband Engine, a VM with a distributed Java heap on a homogeneous TILE-64 system [28]; an extension of the JikesVM to detect and offload loops on CUDA devices [12]; and VMs for Intel's Larrabee GPGPU architecture [22]. Most of these designs are specific to VMs for Java-like languages which are based on a shared-memory concurrent programming model.…”

Section: Vm Support For High-level Concurrencymentioning

confidence: 99%

RELEASE: A High-Level Paradigm for Reliable Large-Scale Server Software

Boudeville

Cesarini²,

Chechina

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Erlang provides a fault-tolerant, reliable model for building concurrent, distributed system based on functional programming. In the RELEASE project the Erlang model is extended to Scalable Distributed Erlang -SD Erlang -supporting general-purpose computation in massively multicore systems. This paper outlines the RELEASE proposal, and indicates the progress of the project in its first six months.

show abstract

“…However, in 2006 NVidia introduced a programming environment called CUDA [25], which allows the GPU to be programmed through more traditional means. At this moment a dedicated programming effort is still required to develop algorithms that perform efficiently on GPU hardware, but efforts are underway for automatic transformation of CPU programs into GPU counterparts [24].…”

Section: General Purpose Computation On Graphics Hardwarementioning

confidence: 99%

Mathematical Morphology in Computer Graphics, Scientific Visualization and Visual Exploration

Roerdink

2011

Mathematical Morphology and Its Applications to Image and Signal Processing

View full text Add to dashboard Cite

Abstract. Historically, mathematical morphology has primarily focused on the processing and analysis of two-dimensional image data. In this paper, we survey a number of other areas where mathematical morphology finds fruitful application, such as computer graphics and solid modeling; path planning; filtering, segmentation and visualization of volume data; or visual exploration of high-dimensional data. We also mention techniques for accelerating morphological computations by using graphics hardware (GPU computing).

show abstract

Automatic parallelization for graphics processing units

Cited by 27 publications

References 29 publications

Leveraging GPUs using cooperative loop speculation

Leveraging GPUs using cooperative loop speculation

RELEASE: A High-Level Paradigm for Reliable Large-Scale Server Software

Mathematical Morphology in Computer Graphics, Scientific Visualization and Visual Exploration

Contact Info

Product

Resources

About